对经济和金融词典的细粒度、基于方面的情绪分析-研究论文

需积分: 16 66 浏览量 2021-06-10 06:31:38 上传评论 1 收藏 635KB PDF 举报

资源详情

资源评论

资源推荐

Fine-grained, aspect-based sentiment analysis on

economic and ﬁnancial lexicon

Sergio Consoli, Luca Barbaglia, and Sebastiano Manzan

European Commission, Joint Research Centre (DG-JRC)

Directorate A-Strategy, Work Programme and Resources Scientiﬁc Development Unit

Via E. Fermi 2749, I-21027 Ispra (VA), Italy.

Email: [name.surname]@ec.europa.eu

Abstract

The last two decades have seen the tremendous increase in the adoption of Se-

mantic Web technologies as a result of the availability of big data, the growth

in computational power and the advancement of artiﬁcial intelligence (AI) tech-

nologies. Cutting-edge semantic techniques are now able to capture sentiments

more accurately in various practical applications, including economic and ﬁ-

nancial forecasting. In particular, the extraction of sentiment from news text,

social media and blogs for the prediction of economic and ﬁnancial variables

has attracted attention in recent years. Despite many successful applications

of sentiment analysis (SA) in these domains, the range of semantic techniques

employed is still limited and mostly focused on the detection of sentiment at

a coarse-grained level, that is, whether the sentiment expressed by the entire

text of a sentence is either positive or negative. This paper proposes a novel

methodology for Fine-Grained Aspect-based Sentiment (FiGAS) analysis. The

aim of the approach is to identify the sentiment associated to speciﬁc topics

of interest in each sentence of a document and assigning real-valued polarity

scores between -1 and +1 to those topics. The proposed approach is completely

unsupervised and customized to the economic and ﬁnancial domains by using

a specialised lexicon make available along with the source code of FiGAS. Our

lexicon-based SA approach relies on a detailed set of semantic polarity rules that

allow to understand the origin of sentiment, in the spirit of the recent trend on

Interpretable AI. We provide an in-depth comparison of the performance of the

FiGAS algorithm relative to other popular lexicon-based SA approaches in pre-

dicting a humanly annotated dataset in the economic and ﬁnancial domains.

Our results indicate that FiGAS statistically outperforms the other methods by

providing a sentiment score that is closer to the one of the human annotators.

Keywords: Natural Language Processing, Sentiment Analysis, Unsupervised

Machine Learning, Interpretability, Sentiment dictionaries, Economy and

Finance

Electronic copy available at: https://ssrn.com/abstract=3766194

1. Introduction

Forecasting the performance of economies and ﬁnancial markets is an ex-

tremely challenging task. Economic and ﬁnancial variables are characterized

by low signal-to-noise ratio, regime changes, and volatility clustering that make

forecasting a diﬃcult task [1, 2]. One additional complication when forecasting

macroeconomic data is the fact that statistical agencies release economic indica-

tors, like Gross Domestic Product (GDP), at low frequencies (e.g., monthly or

even quarterly) and with long publication delays. On the other hand, accurate

economic and ﬁnancial forecasting is of paramount importance given that poli-

cymakers’ decisions and agendas heavily relies on it, especially during turbulent

times like the COVID-19 pandemic [3, 4].

In this context, the vast amount of novel and alternative data sources avail-

able nowadays [5], together with the growth in computational power and the

advancement in artiﬁcial intelligence (AI) technologies, have greatly improved

the forecasting of economic and ﬁnancial dynamics [6, 7]. In particular, there

are several studies that have extracted the sentiment from textual data (e.g.,

social media, newspapers, or economic and ﬁnancial microblogs) as an input

in forecasting models [8, 9, 10, 11, 12]. As the famous American ﬁnancier and

stock investor Bernard Baruch stated a long time ago, “...what actually regis-

ters in the stock market’s ﬂuctuations are not the events themselves, but the hu-

man reactions to these events” (taken from [13]). This statement suggests that

well-targeted measures of sentiment could greatly improve forecasting models

[14, 15], that can serve as a basis for economic and ﬁnancial decisions [16].

There is increasing interest in economics and ﬁnance in the application of

Sentiment analysis (SA) [17, 18] to textual data [8, 19, 20, 21]. The expo-

nentially increasing volume of ﬁnancial reports, economic press releases, news

articles, and social media such as Twitter, Facebook, etc., calls for automatizing

the analysis of such a continuous ﬂow of news data. SA is a Semantic Web tech-

nology, directly related to Natural Language Processing (NLP), that aims at

understanding whether a certain textual message conveys a positive or negative

sentiment with respect to a certain topic, or the overall contextual polarity or

emotional reaction to a document, interaction, or event [17, 18]. Its outcome

might be a quantitative/qualitative polarity (e.g., [−1 : −1], extr neg, neg, neut,

pos, extr pos, etc.) or an emotional state (e.g., joy, anger, etc.).

A major source of textual data used in economic and ﬁnancial SA is rep-

resented by news articles. They are particularly relevant because they discuss

important events, economic press releases, and expert opinions, among others,

that aﬀect consumers’ perception of the economy in diﬀerent ways. First, the

news media convey the latest economic data and professionals’ opinion to con-

sumers. Second, consumers receive a signal about the economy through the

tone and volume of economic reporting. Third, the greater the volume of news

about the economy, the greater the likelihood that consumers will update their

expectations about the economy [22].

SA in economics and ﬁnance has been performed by means of both lexicon-

based (e.g., [23, 9, 24, 25]), and machine learning methods (e.g., [26, 27, 28, 13]).

Electronic copy available at: https://ssrn.com/abstract=3766194

A lexicon-based SA approach [29] is an unsupervised technique and relies on a

dictionary of words with assigned positive or negative sentiment polarity scores.

Given a sentence, all its words are assigned a sentiment polarity value from the

dictionary, and a combining function, such as the word count, sum or average

[17], aggregates the scores into the overall sentiment of the text. A main issue

with this approach is that it requires a customized sentiment dictionary con-

structed a-priori to meet the speciﬁc domain lexicon. As we discuss in detail in

Section 2, general sentiment dictionaries might not be comprehensive enough for

certain speciﬁc domain lexicons. Furthermore, most lexicon-based SA methods

focus on a coarse-grained analysis of the sentiment expressed in the text (see

[30], [27], [31] and [32] among others), that is, they assess if the sentiment of an

entire sentence is either positive or negative[17]. However, coarse-grained meth-

ods might not be suﬃciently accurate in evaluating the sentiment polarity of a

speciﬁc topic of interest contained in a sentence, for instance a certain economic

aspect or ﬁnancial index, given that the sentiment of the entire text is often

not expressed towards that speciﬁc topic [13]. For example, consider the follow-

ing sentence: “Italian GDP continues to grow, despite industrial production is

shrinking and the agricultural sector is suﬀering from the recent adverse mete-

orological conditions that hit the country.”. From the perspective of the Italian

GDP, the sentiment is clearly positive. However, the sentiment expressed for

industrial production is negative, similarly to that for the agricultural sector. On

the other hand, a coarse-grained SA method would assign a negative sentiment

for the entire sentence, given that the majority of terms in the related text have

a negative connotation. This might lead to a very inaccurate representation of

the overall sentence sentiment if one is interested in only a particular economic

topic.

An alternative approach consists of a ﬁne-grained analysis of text that refers

to a speciﬁc term of interest using a supervised machine learning model [28, 13,

33, 34]. A main disadvantage of such approach is its dependence on labelled

data. It is extremely diﬃcult to ensure that suﬃcient and correctly labelled

data can be obtained for a speciﬁc domain. Besides this, the fact that a lexicon-

based approach can be more easily understood and modiﬁed by a human is a

signiﬁcant advantage in a ﬁnancial and economic context, where stakeholders

require clear interpretability of the results.

In this paper we propose FiGAS, a novel lexicon-based SA algorithm espe-

cially developed for the economic and ﬁnancial domains. The goal of FiGAS

(Fine-Grained Aspect-based Sentiment analysis) is to compute a sentiment score

of a sentence with respect to a speciﬁc topic of interest. The resulting sentiment

score is a ﬂoating point value in the range between −1 (very negative/bearish)

and +1 (very positive/bullish), with 0 designating the neutral sentiment. This

score is calculated by means of a set of linguistic polarity rules applied to the

terms semantically connected to the speciﬁc topic of interest in the sentence

and with sentiment values derived from a ﬁne-grained sentiment dictionary for

ﬁnancial and economic text.

Although research on lexicon-based methods providing ﬁne-grained SA for

general text has been already explored [35, 36], we leverage from these works

Electronic copy available at: https://ssrn.com/abstract=3766194

to produce, to the best of our knowledge, the ﬁrst lexicon-based, ﬁne-grained

SA method specialized to the economic and ﬁnancial domains. Our choice of

preferring a lexicon-based ﬁne-grained SA approach, rather than a machine-

leaning one, results in a completely unsupervised algorithm, which does not

require costly operations related to collecting and labeling a relevant training

corpus. In addition, the proposed algorithm is interpretable [37, 38] since the

adopted semantic scheme allows to explain how the method arrives to the ﬁnal

sentiment polarity scores, providing greater transparency and interpretability

of the resulting analysis, thus facilitating its use and extension. FiGAS can

be categorized as a SA approach from sentic computing [39], which aims at

improving algorithm performance by including semantic features into opinion

mining [8, 40]. In other words, the idea is to shift from a word-level to a concept-

level analysis of sentiments. For a survey on sentic computing and discussions

on new trends in SA we refer to [41].

We test the performance of FiGAS for the detection of positive and neg-

ative sentiment about a speciﬁc topic by comparing it against other popular

lexicon-based SA approaches. The evaluation is made using a set of economic

and ﬁnancial sentences labelled by professional human annotators. The results

obtained by the proposed algorithm look very promising, with FiGAS outper-

forming the other tested methods and obtaining similar sentiment scores to

those reported by the professional human annotators in the large majority of

cases.

The remainder of this paper is organized as follows. Section 2 describes

classical approaches and dictionaries to lexicon-based SA. The technical details

of our FiGAS algorithm are provided in Section 3. The results of the empirical

evaluation of the algorithm are demonstrated in Section 4, along with a statis-

tical comparison of the performance of the algorithm against the other popular

lexicon-based SA approaches on a humanly annotated dataset in the economic

and ﬁnancial domain (Section 4.1). In Section 4.2, we report an application ex-

ample related to the construction of economic sentiment indicators from large

text data using FiGAS, with potentials for the forecasting or nowcasting of

economic and ﬁnancial variables. We conclude the paper with a summary and

future work directions in Section 5.

2. Lexicon-based SA and dictionaries

Lexicon-based SA methods calculate the semantic sentiment or orientation

of a piece of text, also referred to as sentiment polarity, by aggregating the

semantic sentiment of its words or phrases [29]. For this purpose, a dictio-

nary of positive and negative sentiment words, also referred to as sentiment or

subjectivity lexicon, is required. In particular, a sentiment lexicon can include

either a list of sentiment (or opinion) terms with their sentiment polarity scores,

expressed as real-valued numbers ranging between a minimum sentiment score

and a maximum sentiment score (e.g., −1 and +1, or 0 and 100, as in most of

the common cases), or just the sets of positive, negative, and neutral terms (or

sometimes their expressed emotions, e.g. anger, joy, fear,..., for a multi-label

Electronic copy available at: https://ssrn.com/abstract=3766194

categorization). For instance, the terms wonderful and growing are examples of

words subjectively expressing a positive sentiment, whereas the terms disgusting

and bad convey a negative sentiment.

When using a lexicon-based SA method, a text (such as a sentence or a

document) is represented using a bag-of-words model [42], that is, disregarding

grammar and word order, but keeping word frequencies. The sentiment lexicon

is used to mark the positive and negative words contained in the text which

are aggregated to compute an overall sentiment score. If one uses a dictionary

where terms are associated to their sentiment polarity scores, the most common

choice is to perform the average of the polarities of the words contained in the

text [17, 8, 35, 19, 12], that is:

i=1

s(w

)/k, (1)

where k is the total number of words in the text that are contained in the

dictionary, and s(w

) is the sentiment polarity score associated to the word w

Alternatively, if the sentiment lexicon only lists positive, negative, and neutral

terms, without providing a granular intensity score, a sentiment polarity value

for the entire text can be derived indirectly by the counts of the contained words

[43, 25, 31, 44], that is:

(p − n)/k, (2)

where k is the total number of words contained in the text, while p and n are

the total number of positive and negative words, respectively.

Most of the existing SA approaches perform a coarse-grained analysis of the

sentiment expressed in a text. That is, they evaluate the sentiment polarity of

a certain text by considering all the expressions of sentiment, without targeting

a speciﬁc topic. For short texts, like tweets or news headlines, this approach

might be suitable, since the majority of sentiment expressions in these texts

are relevant to the topic of interest. However, when texts are longer, such as

in the case of news articles, these methods can be inaccurate since they often

identify sentiment expressions that are irrelevant to the topic(s) of interest. To

overcome these limitations, several papers have recently proposed ﬁne-grained

SA methods [35, 28, 13]. The novel aspect of these approaches is that they focus

on the particular phrases which express sentiment with respect to a speciﬁc

topic of interest, and analyze these expressions in a ﬁne-grained manner. Our

proposed SA method belongs to this category, as it will be better described in

Section 3.

Broadly speaking, SA lexicons can be built by human annotation, or by using

an automatic generation mechanisms [17]. In addition, they can be general-

purpose, (i.e., able to be used for various applications since they are constructed

to cover common lexicon), or domain-speciﬁc (i.e., limited or specialized in

purpose to suit a speciﬁc application domain). In addition to domain speciﬁcity,

there exists also language speciﬁcity. The majority of sentiment lexicons have

been constructed for the English language, while multilinguality still remains

Electronic copy available at: https://ssrn.com/abstract=3766194

剩余36页未读，继续阅读

评论收藏

内容反馈

weixin_38735804

粉丝: 5
资源: 966

对经济和金融词典的细粒度、基于方面的情绪分析-研究论文

评论0

最新资源

对经济和金融词典的细粒度、基于方面的情绪分析-研究论文

评论0

基于词典和机器学习组合的情感分析

细粒度中文情感词典

论文研究-基于深度学习的细粒度中文文本情感分析 .pdf

论文研究-基于深度学习的社交网络平台细粒度情感分析.pdf

论文研究 - 支持向量机用于尼日利亚银行金融推文的情绪分析

基于注意力机制的细粒度情感分析.zip

基于GRU实现评论文本情感细粒度分析

情感分析

情感分析词典

会计金融情绪词典.zip

基于python的细粒度用户评论情感分析设计与实现

2018细粒度用户评论情感分析.rar

基于机器学习与情感词典的文本主题概括及情感分析.pdf

情感分析-中文金融情感词典

从技术和摄影方面分析基于 CCD 和 CMOS 传感器的图像-研究论文

使用 SVM 分析基于 EEG 的 DEAP 和 SEED-IV 数据库的情绪检测-研究论文

ACSA:用于方面类别情感分析的论文，模型和数据集

我的毕业设计-情感分析

结合基于学习和基于词典的技术对产品评论进行跨域情感分析

基于OCEMOTION的中文微情感分析系统.zip

中英文会计&金融情绪词典.rar

论文研究-基于情感计算的微博文本情绪分类方法 .pdf

Python-aichallenger2018细粒度情感分类第一名解决方案

基于python的金融文本情感分析模型代码实现

基于BosonNLP情感词典的情感分析模型

最新资源