没有合适的资源?快使用搜索试试~ 我知道了~
对经济和金融词典的细粒度、基于方面的情绪分析-研究论文
需积分: 16 3 下载量 66 浏览量
2021-06-10
06:31:38
上传
评论 1
收藏 635KB PDF 举报
温馨提示
由于大数据的可用性、计算能力的增长和人工智能 (AI) 技术的进步,在过去的 20 年中,语义 Web 技术的采用大幅增加。 尖端的语义技术现在能够在各种实际应用中更准确地捕捉情绪,包括经济和金融预测。 尤其是从新闻文本、社交媒体和博客中提取情感以预测经济和金融变量近年来引起了人们的关注。 尽管情感分析(SA)在这些领域有许多成功的应用,但所采用的语义技术的范围仍然有限,并且主要集中在粗粒度级别的情感检测,即整个文本表达的情感是否一个句子要么是肯定的,要么是否定的。 本文提出了一种用于细粒度基于方面的情绪 (FiGAS) 分析的新方法。 该方法的目的是识别与文档中每个句子中感兴趣的特定主题相关的情绪,并为这些主题分配 -1 到 +1 之间的实值极性分数。 通过使用与 FiGAS 的源代码一起提供的专门词典,所提出的方法是完全不受监督的,并针对经济和金融领域进行定制。 我们的基于词典的 SA 方法依赖于一组详细的语义极性规则,可以根据 \textit{Interpretable AI} 的最新趋势来理解情绪的起源。 我们提供了 FiGAS 算法相对于其他流行的基于词典的 SA 方法在预测经济和金融领域人工注释数据集的性能的深入比较。 我们的结果表明 FiGAS 通过提供更接近人类注释者之一的情绪分数在统计上优于其他方法。
资源详情
资源评论
资源推荐
Fine-grained, aspect-based sentiment analysis on
economic and financial lexicon
Sergio Consoli, Luca Barbaglia, and Sebastiano Manzan
European Commission, Joint Research Centre (DG-JRC)
Directorate A-Strategy, Work Programme and Resources Scientific Development Unit
Via E. Fermi 2749, I-21027 Ispra (VA), Italy.
Email: [name.surname]@ec.europa.eu
Abstract
The last two decades have seen the tremendous increase in the adoption of Se-
mantic Web technologies as a result of the availability of big data, the growth
in computational power and the advancement of artificial intelligence (AI) tech-
nologies. Cutting-edge semantic techniques are now able to capture sentiments
more accurately in various practical applications, including economic and fi-
nancial forecasting. In particular, the extraction of sentiment from news text,
social media and blogs for the prediction of economic and financial variables
has attracted attention in recent years. Despite many successful applications
of sentiment analysis (SA) in these domains, the range of semantic techniques
employed is still limited and mostly focused on the detection of sentiment at
a coarse-grained level, that is, whether the sentiment expressed by the entire
text of a sentence is either positive or negative. This paper proposes a novel
methodology for Fine-Grained Aspect-based Sentiment (FiGAS) analysis. The
aim of the approach is to identify the sentiment associated to specific topics
of interest in each sentence of a document and assigning real-valued polarity
scores between -1 and +1 to those topics. The proposed approach is completely
unsupervised and customized to the economic and financial domains by using
a specialised lexicon make available along with the source code of FiGAS. Our
lexicon-based SA approach relies on a detailed set of semantic polarity rules that
allow to understand the origin of sentiment, in the spirit of the recent trend on
Interpretable AI. We provide an in-depth comparison of the performance of the
FiGAS algorithm relative to other popular lexicon-based SA approaches in pre-
dicting a humanly annotated dataset in the economic and financial domains.
Our results indicate that FiGAS statistically outperforms the other methods by
providing a sentiment score that is closer to the one of the human annotators.
Keywords: Natural Language Processing, Sentiment Analysis, Unsupervised
Machine Learning, Interpretability, Sentiment dictionaries, Economy and
Finance
Electronic copy available at: https://ssrn.com/abstract=3766194
1. Introduction
Forecasting the performance of economies and financial markets is an ex-
tremely challenging task. Economic and financial variables are characterized
by low signal-to-noise ratio, regime changes, and volatility clustering that make
forecasting a difficult task [1, 2]. One additional complication when forecasting
macroeconomic data is the fact that statistical agencies release economic indica-
tors, like Gross Domestic Product (GDP), at low frequencies (e.g., monthly or
even quarterly) and with long publication delays. On the other hand, accurate
economic and financial forecasting is of paramount importance given that poli-
cymakers’ decisions and agendas heavily relies on it, especially during turbulent
times like the COVID-19 pandemic [3, 4].
In this context, the vast amount of novel and alternative data sources avail-
able nowadays [5], together with the growth in computational power and the
advancement in artificial intelligence (AI) technologies, have greatly improved
the forecasting of economic and financial dynamics [6, 7]. In particular, there
are several studies that have extracted the sentiment from textual data (e.g.,
social media, newspapers, or economic and financial microblogs) as an input
in forecasting models [8, 9, 10, 11, 12]. As the famous American financier and
stock investor Bernard Baruch stated a long time ago, “...what actually regis-
ters in the stock market’s fluctuations are not the events themselves, but the hu-
man reactions to these events” (taken from [13]). This statement suggests that
well-targeted measures of sentiment could greatly improve forecasting models
[14, 15], that can serve as a basis for economic and financial decisions [16].
There is increasing interest in economics and finance in the application of
Sentiment analysis (SA) [17, 18] to textual data [8, 19, 20, 21]. The expo-
nentially increasing volume of financial reports, economic press releases, news
articles, and social media such as Twitter, Facebook, etc., calls for automatizing
the analysis of such a continuous flow of news data. SA is a Semantic Web tech-
nology, directly related to Natural Language Processing (NLP), that aims at
understanding whether a certain textual message conveys a positive or negative
sentiment with respect to a certain topic, or the overall contextual polarity or
emotional reaction to a document, interaction, or event [17, 18]. Its outcome
might be a quantitative/qualitative polarity (e.g., [−1 : −1], extr neg, neg, neut,
pos, extr pos, etc.) or an emotional state (e.g., joy, anger, etc.).
A major source of textual data used in economic and financial SA is rep-
resented by news articles. They are particularly relevant because they discuss
important events, economic press releases, and expert opinions, among others,
that affect consumers’ perception of the economy in different ways. First, the
news media convey the latest economic data and professionals’ opinion to con-
sumers. Second, consumers receive a signal about the economy through the
tone and volume of economic reporting. Third, the greater the volume of news
about the economy, the greater the likelihood that consumers will update their
expectations about the economy [22].
SA in economics and finance has been performed by means of both lexicon-
based (e.g., [23, 9, 24, 25]), and machine learning methods (e.g., [26, 27, 28, 13]).
2
Electronic copy available at: https://ssrn.com/abstract=3766194
A lexicon-based SA approach [29] is an unsupervised technique and relies on a
dictionary of words with assigned positive or negative sentiment polarity scores.
Given a sentence, all its words are assigned a sentiment polarity value from the
dictionary, and a combining function, such as the word count, sum or average
[17], aggregates the scores into the overall sentiment of the text. A main issue
with this approach is that it requires a customized sentiment dictionary con-
structed a-priori to meet the specific domain lexicon. As we discuss in detail in
Section 2, general sentiment dictionaries might not be comprehensive enough for
certain specific domain lexicons. Furthermore, most lexicon-based SA methods
focus on a coarse-grained analysis of the sentiment expressed in the text (see
[30], [27], [31] and [32] among others), that is, they assess if the sentiment of an
entire sentence is either positive or negative[17]. However, coarse-grained meth-
ods might not be sufficiently accurate in evaluating the sentiment polarity of a
specific topic of interest contained in a sentence, for instance a certain economic
aspect or financial index, given that the sentiment of the entire text is often
not expressed towards that specific topic [13]. For example, consider the follow-
ing sentence: “Italian GDP continues to grow, despite industrial production is
shrinking and the agricultural sector is suffering from the recent adverse mete-
orological conditions that hit the country.”. From the perspective of the Italian
GDP, the sentiment is clearly positive. However, the sentiment expressed for
industrial production is negative, similarly to that for the agricultural sector. On
the other hand, a coarse-grained SA method would assign a negative sentiment
for the entire sentence, given that the majority of terms in the related text have
a negative connotation. This might lead to a very inaccurate representation of
the overall sentence sentiment if one is interested in only a particular economic
topic.
An alternative approach consists of a fine-grained analysis of text that refers
to a specific term of interest using a supervised machine learning model [28, 13,
33, 34]. A main disadvantage of such approach is its dependence on labelled
data. It is extremely difficult to ensure that sufficient and correctly labelled
data can be obtained for a specific domain. Besides this, the fact that a lexicon-
based approach can be more easily understood and modified by a human is a
significant advantage in a financial and economic context, where stakeholders
require clear interpretability of the results.
In this paper we propose FiGAS, a novel lexicon-based SA algorithm espe-
cially developed for the economic and financial domains. The goal of FiGAS
(Fine-Grained Aspect-based Sentiment analysis) is to compute a sentiment score
of a sentence with respect to a specific topic of interest. The resulting sentiment
score is a floating point value in the range between −1 (very negative/bearish)
and +1 (very positive/bullish), with 0 designating the neutral sentiment. This
score is calculated by means of a set of linguistic polarity rules applied to the
terms semantically connected to the specific topic of interest in the sentence
and with sentiment values derived from a fine-grained sentiment dictionary for
financial and economic text.
Although research on lexicon-based methods providing fine-grained SA for
general text has been already explored [35, 36], we leverage from these works
3
Electronic copy available at: https://ssrn.com/abstract=3766194
to produce, to the best of our knowledge, the first lexicon-based, fine-grained
SA method specialized to the economic and financial domains. Our choice of
preferring a lexicon-based fine-grained SA approach, rather than a machine-
leaning one, results in a completely unsupervised algorithm, which does not
require costly operations related to collecting and labeling a relevant training
corpus. In addition, the proposed algorithm is interpretable [37, 38] since the
adopted semantic scheme allows to explain how the method arrives to the final
sentiment polarity scores, providing greater transparency and interpretability
of the resulting analysis, thus facilitating its use and extension. FiGAS can
be categorized as a SA approach from sentic computing [39], which aims at
improving algorithm performance by including semantic features into opinion
mining [8, 40]. In other words, the idea is to shift from a word-level to a concept-
level analysis of sentiments. For a survey on sentic computing and discussions
on new trends in SA we refer to [41].
We test the performance of FiGAS for the detection of positive and neg-
ative sentiment about a specific topic by comparing it against other popular
lexicon-based SA approaches. The evaluation is made using a set of economic
and financial sentences labelled by professional human annotators. The results
obtained by the proposed algorithm look very promising, with FiGAS outper-
forming the other tested methods and obtaining similar sentiment scores to
those reported by the professional human annotators in the large majority of
cases.
The remainder of this paper is organized as follows. Section 2 describes
classical approaches and dictionaries to lexicon-based SA. The technical details
of our FiGAS algorithm are provided in Section 3. The results of the empirical
evaluation of the algorithm are demonstrated in Section 4, along with a statis-
tical comparison of the performance of the algorithm against the other popular
lexicon-based SA approaches on a humanly annotated dataset in the economic
and financial domain (Section 4.1). In Section 4.2, we report an application ex-
ample related to the construction of economic sentiment indicators from large
text data using FiGAS, with potentials for the forecasting or nowcasting of
economic and financial variables. We conclude the paper with a summary and
future work directions in Section 5.
2. Lexicon-based SA and dictionaries
Lexicon-based SA methods calculate the semantic sentiment or orientation
of a piece of text, also referred to as sentiment polarity, by aggregating the
semantic sentiment of its words or phrases [29]. For this purpose, a dictio-
nary of positive and negative sentiment words, also referred to as sentiment or
subjectivity lexicon, is required. In particular, a sentiment lexicon can include
either a list of sentiment (or opinion) terms with their sentiment polarity scores,
expressed as real-valued numbers ranging between a minimum sentiment score
and a maximum sentiment score (e.g., −1 and +1, or 0 and 100, as in most of
the common cases), or just the sets of positive, negative, and neutral terms (or
sometimes their expressed emotions, e.g. anger, joy, fear,..., for a multi-label
4
Electronic copy available at: https://ssrn.com/abstract=3766194
categorization). For instance, the terms wonderful and growing are examples of
words subjectively expressing a positive sentiment, whereas the terms disgusting
and bad convey a negative sentiment.
When using a lexicon-based SA method, a text (such as a sentence or a
document) is represented using a bag-of-words model [42], that is, disregarding
grammar and word order, but keeping word frequencies. The sentiment lexicon
is used to mark the positive and negative words contained in the text which
are aggregated to compute an overall sentiment score. If one uses a dictionary
where terms are associated to their sentiment polarity scores, the most common
choice is to perform the average of the polarities of the words contained in the
text [17, 8, 35, 19, 12], that is:
k
X
i=1
s(w
i
)/k, (1)
where k is the total number of words in the text that are contained in the
dictionary, and s(w
i
) is the sentiment polarity score associated to the word w
i
.
Alternatively, if the sentiment lexicon only lists positive, negative, and neutral
terms, without providing a granular intensity score, a sentiment polarity value
for the entire text can be derived indirectly by the counts of the contained words
[43, 25, 31, 44], that is:
(p − n)/k, (2)
where k is the total number of words contained in the text, while p and n are
the total number of positive and negative words, respectively.
Most of the existing SA approaches perform a coarse-grained analysis of the
sentiment expressed in a text. That is, they evaluate the sentiment polarity of
a certain text by considering all the expressions of sentiment, without targeting
a specific topic. For short texts, like tweets or news headlines, this approach
might be suitable, since the majority of sentiment expressions in these texts
are relevant to the topic of interest. However, when texts are longer, such as
in the case of news articles, these methods can be inaccurate since they often
identify sentiment expressions that are irrelevant to the topic(s) of interest. To
overcome these limitations, several papers have recently proposed fine-grained
SA methods [35, 28, 13]. The novel aspect of these approaches is that they focus
on the particular phrases which express sentiment with respect to a specific
topic of interest, and analyze these expressions in a fine-grained manner. Our
proposed SA method belongs to this category, as it will be better described in
Section 3.
Broadly speaking, SA lexicons can be built by human annotation, or by using
an automatic generation mechanisms [17]. In addition, they can be general-
purpose, (i.e., able to be used for various applications since they are constructed
to cover common lexicon), or domain-specific (i.e., limited or specialized in
purpose to suit a specific application domain). In addition to domain specificity,
there exists also language specificity. The majority of sentiment lexicons have
been constructed for the English language, while multilinguality still remains
5
Electronic copy available at: https://ssrn.com/abstract=3766194
剩余36页未读,继续阅读
weixin_38735804
- 粉丝: 5
- 资源: 966
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 4ad004-基于Android的实时健康感知系统_springboot+vue.zip
- 4ad003-健康饮食APP_springboot+vue+android.zip
- 这个excel可以让你浪费十积分
- 基于springboot的科研工作量管理系统的设计与实现源码(java毕业设计完整源码+LW).zip
- 基于springboot的纺织品企业财务管理系统源码(java毕业设计完整源码+LW).zip
- 机械设计在线式焊接机sw17可编辑项目全套技术资料.zip
- 基于springboot的医院后台管理系统的设计与实现源码(java毕业设计完整源码+LW).zip
- 基于springboot的疫情隔离管理系统源码(java毕业设计完整源码).zip
- 基于遗传算法的风电混合储能容量优化配置 开发语言:matlab
- 毕业设计基于Pytorch+YOLO的车辆检测和多标签属性识别项目源码+说明(高分毕设)
- 基于springboot的工厂车间管理系统的设计源码(java毕业设计完整源码+LW).zip
- 机械设计在线移栽式打标机sw17可编辑项目全套技术资料.zip
- HP咨询方法论(24页 PPT).pptx
- 人力成本控制与分析(39页).ppt
- 《用户增长方法论》(43页 PPT).pptx
- 楼宇安防智能化人脸识别解决方案(40页 PPT).pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0