没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
试读
12页
股市动荡,集群波动,基于情绪和新闻报道的价格波动很普遍。 贸易商使用各种公开可用的信息来预测营销决策。 本文使用对可公开获得的新闻报道的情感分析,为交易者提供了有关股票交易的建议。 它基于一个假设,即新闻文章对股票市场有影响,以此假设为基础,我们研究了新闻与股票趋势之间的关系,并证明了负面新闻对股票市场具有持续影响。 为了证明这一假设,使用了半监督学习技术来构建新闻分类的最终模型。 研究表明,以TF-IDF为特征的SVM在进一步分析中表现良好。 预测模型的准确性超过90%,与股票的退货标签具有52%的相关性。
资源推荐
资源详情
资源评论
Real-Time Stock Trend Prediction via Sentiment Analysis
of News Article
Sanmoy Paul and Shashank Vishnoi
Department of Data Science (Business Analytics), NMIMS University, Mumbai
sanmoy.nmims@gmail.com, shashank.vishnoi.nmims@gmail.com
Abstract
The stock market is volatile and volatility occurs in clusters, price fluctuations based on sentiment and
news reports are common. A trader uses a wide variety of publicly available information to forecast
the marketing decision. This paper proposes an advice to traders for stock trading using sentimental
analysis of publically available news reports. It is based on a hypothesis, that news articles have an
impact on the stock market, with this hypothesis we study the relationship between news and stock
trend and also proved that negative news has a persistent effect on the stock market. In order to prove
this assumption semi-supervised learning technique is being used to build the final model of news
classification. This research shows that SVM with TF-IDF as feature performs well in further
analysis. The accuracy of the prediction model is more than 90% having 52% correlation with the
return label of a stock.
Keywords: Text Mining, Human Sentiments, KNN, Random Forest, Multinomial Naïve Bayes,
linear SVM, News.
1. Introduction
Indian stock market is unpredictable in nature. Numerous attempts have been made in order to predict
the stock market movement. According to Behavioural economics, every individual is affected by
psychological, cognitive, emotional, cultural and social factors on the economic decision. Is there any
way to predict the stock market? There is a lot of information which affects the price movements of a
stock, therefore, making it difficult for individual investors to predict the direction of stock price
movements. Also monitoring such information on a real-time basis becomes even difficult.
Nowadays, it is seen that stock prices not only depends on the historical factors and other macro-
Electronic copy available at: https://ssrn.com/abstract=3753015
econometric variables, it also depends on the moods and sentiments of the investors that are being
reflected in the news articles. Thus incorporating such opinions and sentiments can bring an
improvement compared to the baseline model which is dependent only on the historical prices. In
recent times the amount of news articles is increasing tremendously. Therefore analysing such news,
extracting relevant information from it and using that information in prediction becomes an interesting
problem. Our main objective is to build and test such a system which emphasises on the sentiment and
author tone is incorporated within a financial news article in order to predict the better trend of the
stock market in a short-term basis.
2. Related work
The paper of (Nguyen, Shirai, & Velcin, 2015)
[5]
showed an evaluation of the effectiveness of the
sentiment analysis in stock prediction. They compared different sentimental models over 18 stocks for
a period of one year to analyse which method yielded maximum accuracy. The first method, Price
only method predicts the price of a stock by using time series analysis of historical data. The Second
method is Human Sentiment where 15.6% of the messages in Message board dataset is being
classified into five labels: Strong buy, Buy, hold, Sell and Strong Sell. SVM is used as a prediction
model. The Third Method is Sentiment Classification, the remaining 84.4% of the messages which is
not classified explicitly, a model is being built to extract the sentiments from those messages and then
classified them into five categories. The feature representation is the bag of words from the title and
content of the message and feature weightage is determined by TF-IDF and then fed into SVM
classification model. The Fourth method is LDA (Latent Dirichlet Allocation) is used in order to
extract the hidden topics. The LDA model is trained on the training dataset and unseen test set is
being used to infer the topics by using Gibbs Sampling. After that the probability of each topic of each
message is being calculated. Then these probabilities and the lagged prices are fed into the SVM
prediction model. The Fifth method used is JST Model which extracts topics and sentiments
simultaneously. Through Gibbs sampling 50 topics and 3 sentiments are being chosen. Next, the joint
probability of topic and sentiment is calculated. Then these probabilities are integrated with the
prediction model. The final and sixth method is Aspect Sentiment Model, in this model topics are
extracted from the training dataset, topics occurring less than 10 times are being removed. Now based
on the topic list the sentiment values are being extracted and opinion words are identified using
SentiWordNet and sentiment scores are assigned: positivity, objectivity and negativity. Aspect
sentiment model combined with the prediction model yielded the best accuracy.
The paper of (Li, Xie, Chen, Wang, & Deng, 2014)
[4]
made use of Harvard psychological dictionary
and Loughran-McDonald financial sentiment dictionary to construct a sentiment space. The model
prediction accuracy and performance at different market classification is being compared. Each
document is being represented as a vector of sentiment values by summing up the sentiment vectors
of each word in the document. The news is collected from FINET. Two manual dictionaries is being
used one Harvard Sentiment Dictionary (HVD) and Loughran McDonald Dictionary(LMD) for
sentiment classification. SVM is used as text classification model. Another two approaches being used
is SenticNet and bag of words. Sentiment Polarity in order to classify sentiments into 3 categories:
positive, negative and neutral. By comparison from validation set it is being found that HVD, LMD,
SN performs better than bag of words. Also HVD, LMD and BoW perform better than polarity
approaches. After comparing all the approaches it is found that LMD is the best approach for finding
the sentiments scores.
The paper of (Shri Bharathi & Geetha, 2017)
[7]
used a method RSS(Really Simple Syndication) for
prediction in the stock market of Arab Bank Company. This RSS feed helps to collect news feed from
Electronic copy available at: https://ssrn.com/abstract=3753015
the stock market which is being used as a dataset. Thus the system which is being developed
automatically identifies the news opinions with the help of RSS news feeds and then predicts the
stock movement. All the RSS feeds are then stored in an input sentence module as a document.
Sentence splitting module cleans the news feeds and splits into parsed sentences. The NLP module is
used to identify and extract subjective information from the source materials. Parts-of-Speech-Tagger
is being used to read the text and assigns parts of speech to each word such as noun, adjective, verb
etc. In Dictionary based approach is used to find opinion words and their polarities. It uses antonyms,
synonyms and hierarchies in WordNet to determine word sentiments. Sentence Polarity module
calculates the polarity of each sentence. If the polarity is positive then sentence is considered as
positive and vice-versa. The polarity score value is being classified as positive, neutral and negative.
If the opinion is positive then the stock goes up otherwise down. The polarity value for a month of
499 sentence is calculated which in then classified into positive, negative and neutral. Finally based
on this it helps the marketers to buy or sell the stocks.
The paper of (Lee, Surdeanu, Maccartney, & Jurafsky, 2014)
[3]
have used 8K financial reports for all
S&P 500 companies between 2002 and 2012 for their analysis of stock prediction. For Linguistic
feature types unigram feature is being used and is lemmatized. Features are removed occurring less
than 10 times and PMI is used to retain the linguistic features. Dimensionality reduction, NMF is
implemented and then the resulting vector combined with baseline feature is fed into random forest
classifier. The model is being trained with random forest using 2000 trees and compared with baseline
model. The baseline model-1 is a deterministic system that predicts the system is UP when the actual
earnings are better than expected. Baseline2 model uses 21 financial features, Unigram model uses
unigram features in addition with 21 financial features. Then an ensemble model is being built based
on the dimension of NMF. Using linguistics features improves the performance over non- linguistics
features. Incorporating linguistics feature in the model improves the predictive power in short term.
SentiWordNet lexicon is used to give scores to the words, positive words receives high score while
negative words receives low score. Further Bigram and word clustering is used in order to combine
two or more words. But bigram model does not improve the performance significantly over unigram
model. Thus this unigram model improves short term prediction accuracy.
The paper of (Zhai, Cohen, & Atreya, 2011)
[8]
developed JAVA code and used the Stanford Classifier
to quickly analyse financial news articles and using this model predict the S&P 500 index. Every
article published in The New York Times from Jan 1987 to Jun 2007 used with proper annotation
with date, category, and set of tags describing the content of the article. In order to classify natural
language sentiment of news articles, two methods were tested for determining sentiment: manual and
automatic ones using stock market results. Manual classification involved reading each article and
assigning it a sentiment tag: positive, neutral, or negative. A class, NYT Manual Classifier, was built
to aid in this process. Manual classification is time consuming. It was able to classify only for two
months‟ worth of articles, January and June 2006 were chosen. January contains many articles
summarizing the results of the previous year and speculating on the upcoming year. In June,
journalists may be more focused on day-today movements of the stock market. Automatic approach
used market movement in which log return: the log of today‟s close divided by yesterday‟s close is
being used. System provides interesting analysis of market sentiment in hindsight, it is less effective
when used for predictive purposes. The sentiment results produced could instead be an input to
another trading system or simply be given to human traders to aid their judgments.
The paper of (Zhang & Skiena, 2010)
[9]
used Quantitative media (blogs and news as a comparison)
data generated by a large-scale natural language processing and performed Comparative Study of
Blogs and News, Large Scale Analysis and Sentiment Oriented equity Trading. They used Stocks
Electronic copy available at: https://ssrn.com/abstract=3753015
剩余11页未读,继续阅读
资源评论
weixin_38704701
- 粉丝: 8
- 资源: 982
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 批量将py编译为pyd文件.atbx
- Python项目-学生管理系统
- verilog HDL硬件语法设计包括算术运算三人表决器Verilog的阻塞和非阻塞赋值源码例程quartus13.1工程合集
- 【文章话题分类论文】OpenAlex Topic Classification Whitepaper
- linux学习常用命令
- 功率拓扑快速参考指南-ti,TI官方出品
- 开关电源拓朴图表,各种电路拓扑表格
- 登录和注册 前端:vue3+iview plus +axios 后台:spring boot +mybatis
- 软件测试入门简介:从基础到实践的全面介绍
- 2024CDA Level Ⅰ 认证考试大纲
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功