【免费】通过新闻情感分析实时预测股票趋势-研究论文_利用Tushare获取股票数据线性回归分析资源-CSDN文库

需积分: 0 93 浏览量 2021-05-20 17:32:34 上传评论 1 收藏 617KB PDF 举报

资源推荐

资源详情

资源评论

Real-Time Stock Trend Prediction via Sentiment Analysis

of News Article

Sanmoy Paul and Shashank Vishnoi

Department of Data Science (Business Analytics), NMIMS University, Mumbai

sanmoy.nmims@gmail.com, shashank.vishnoi.nmims@gmail.com

Abstract

The stock market is volatile and volatility occurs in clusters, price fluctuations based on sentiment and

news reports are common. A trader uses a wide variety of publicly available information to forecast

the marketing decision. This paper proposes an advice to traders for stock trading using sentimental

analysis of publically available news reports. It is based on a hypothesis, that news articles have an

impact on the stock market, with this hypothesis we study the relationship between news and stock

trend and also proved that negative news has a persistent effect on the stock market. In order to prove

this assumption semi-supervised learning technique is being used to build the final model of news

classification. This research shows that SVM with TF-IDF as feature performs well in further

analysis. The accuracy of the prediction model is more than 90% having 52% correlation with the

return label of a stock.

Keywords: Text Mining, Human Sentiments, KNN, Random Forest, Multinomial Naïve Bayes,

linear SVM, News.

1. Introduction

Indian stock market is unpredictable in nature. Numerous attempts have been made in order to predict

the stock market movement. According to Behavioural economics, every individual is affected by

psychological, cognitive, emotional, cultural and social factors on the economic decision. Is there any

way to predict the stock market? There is a lot of information which affects the price movements of a

stock, therefore, making it difficult for individual investors to predict the direction of stock price

movements. Also monitoring such information on a real-time basis becomes even difficult.

Nowadays, it is seen that stock prices not only depends on the historical factors and other macro-

Electronic copy available at: https://ssrn.com/abstract=3753015

econometric variables, it also depends on the moods and sentiments of the investors that are being

reflected in the news articles. Thus incorporating such opinions and sentiments can bring an

improvement compared to the baseline model which is dependent only on the historical prices. In

recent times the amount of news articles is increasing tremendously. Therefore analysing such news,

extracting relevant information from it and using that information in prediction becomes an interesting

problem. Our main objective is to build and test such a system which emphasises on the sentiment and

author tone is incorporated within a financial news article in order to predict the better trend of the

stock market in a short-term basis.

2. Related work

The paper of (Nguyen, Shirai, & Velcin, 2015)

[5]

showed an evaluation of the effectiveness of the

sentiment analysis in stock prediction. They compared different sentimental models over 18 stocks for

a period of one year to analyse which method yielded maximum accuracy. The first method, Price

only method predicts the price of a stock by using time series analysis of historical data. The Second

method is Human Sentiment where 15.6% of the messages in Message board dataset is being

classified into five labels: Strong buy, Buy, hold, Sell and Strong Sell. SVM is used as a prediction

model. The Third Method is Sentiment Classification, the remaining 84.4% of the messages which is

not classified explicitly, a model is being built to extract the sentiments from those messages and then

classified them into five categories. The feature representation is the bag of words from the title and

content of the message and feature weightage is determined by TF-IDF and then fed into SVM

classification model. The Fourth method is LDA (Latent Dirichlet Allocation) is used in order to

extract the hidden topics. The LDA model is trained on the training dataset and unseen test set is

being used to infer the topics by using Gibbs Sampling. After that the probability of each topic of each

message is being calculated. Then these probabilities and the lagged prices are fed into the SVM

prediction model. The Fifth method used is JST Model which extracts topics and sentiments

simultaneously. Through Gibbs sampling 50 topics and 3 sentiments are being chosen. Next, the joint

probability of topic and sentiment is calculated. Then these probabilities are integrated with the

prediction model. The final and sixth method is Aspect Sentiment Model, in this model topics are

extracted from the training dataset, topics occurring less than 10 times are being removed. Now based

on the topic list the sentiment values are being extracted and opinion words are identified using

SentiWordNet and sentiment scores are assigned: positivity, objectivity and negativity. Aspect

sentiment model combined with the prediction model yielded the best accuracy.

The paper of (Li, Xie, Chen, Wang, & Deng, 2014)

[4]

made use of Harvard psychological dictionary

and Loughran-McDonald financial sentiment dictionary to construct a sentiment space. The model

prediction accuracy and performance at different market classification is being compared. Each

document is being represented as a vector of sentiment values by summing up the sentiment vectors

of each word in the document. The news is collected from FINET. Two manual dictionaries is being

used one Harvard Sentiment Dictionary (HVD) and Loughran McDonald Dictionary(LMD) for

sentiment classification. SVM is used as text classification model. Another two approaches being used

is SenticNet and bag of words. Sentiment Polarity in order to classify sentiments into 3 categories:

positive, negative and neutral. By comparison from validation set it is being found that HVD, LMD,

SN performs better than bag of words. Also HVD, LMD and BoW perform better than polarity

approaches. After comparing all the approaches it is found that LMD is the best approach for finding

the sentiments scores.

The paper of (Shri Bharathi & Geetha, 2017)

[7]

used a method RSS(Really Simple Syndication) for

prediction in the stock market of Arab Bank Company. This RSS feed helps to collect news feed from

Electronic copy available at: https://ssrn.com/abstract=3753015

the stock market which is being used as a dataset. Thus the system which is being developed

automatically identifies the news opinions with the help of RSS news feeds and then predicts the

stock movement. All the RSS feeds are then stored in an input sentence module as a document.

Sentence splitting module cleans the news feeds and splits into parsed sentences. The NLP module is

used to identify and extract subjective information from the source materials. Parts-of-Speech-Tagger

is being used to read the text and assigns parts of speech to each word such as noun, adjective, verb

etc. In Dictionary based approach is used to find opinion words and their polarities. It uses antonyms,

synonyms and hierarchies in WordNet to determine word sentiments. Sentence Polarity module

calculates the polarity of each sentence. If the polarity is positive then sentence is considered as

positive and vice-versa. The polarity score value is being classified as positive, neutral and negative.

If the opinion is positive then the stock goes up otherwise down. The polarity value for a month of

499 sentence is calculated which in then classified into positive, negative and neutral. Finally based

on this it helps the marketers to buy or sell the stocks.

The paper of (Lee, Surdeanu, Maccartney, & Jurafsky, 2014)

[3]

have used 8K financial reports for all

S&P 500 companies between 2002 and 2012 for their analysis of stock prediction. For Linguistic

feature types unigram feature is being used and is lemmatized. Features are removed occurring less

than 10 times and PMI is used to retain the linguistic features. Dimensionality reduction, NMF is

implemented and then the resulting vector combined with baseline feature is fed into random forest

classifier. The model is being trained with random forest using 2000 trees and compared with baseline

model. The baseline model-1 is a deterministic system that predicts the system is UP when the actual

earnings are better than expected. Baseline2 model uses 21 financial features, Unigram model uses

unigram features in addition with 21 financial features. Then an ensemble model is being built based

on the dimension of NMF. Using linguistics features improves the performance over non- linguistics

features. Incorporating linguistics feature in the model improves the predictive power in short term.

SentiWordNet lexicon is used to give scores to the words, positive words receives high score while

negative words receives low score. Further Bigram and word clustering is used in order to combine

two or more words. But bigram model does not improve the performance significantly over unigram

model. Thus this unigram model improves short term prediction accuracy.

The paper of (Zhai, Cohen, & Atreya, 2011)

[8]

developed JAVA code and used the Stanford Classifier

to quickly analyse financial news articles and using this model predict the S&P 500 index. Every

article published in The New York Times from Jan 1987 to Jun 2007 used with proper annotation

with date, category, and set of tags describing the content of the article. In order to classify natural

language sentiment of news articles, two methods were tested for determining sentiment: manual and

automatic ones using stock market results. Manual classification involved reading each article and

assigning it a sentiment tag: positive, neutral, or negative. A class, NYT Manual Classifier, was built

to aid in this process. Manual classification is time consuming. It was able to classify only for two

months‟ worth of articles, January and June 2006 were chosen. January contains many articles

summarizing the results of the previous year and speculating on the upcoming year. In June,

journalists may be more focused on day-today movements of the stock market. Automatic approach

used market movement in which log return: the log of today‟s close divided by yesterday‟s close is

being used. System provides interesting analysis of market sentiment in hindsight, it is less effective

when used for predictive purposes. The sentiment results produced could instead be an input to

another trading system or simply be given to human traders to aid their judgments.

The paper of (Zhang & Skiena, 2010)

[9]

used Quantitative media (blogs and news as a comparison)

data generated by a large-scale natural language processing and performed Comparative Study of

Blogs and News, Large Scale Analysis and Sentiment Oriented equity Trading. They used Stocks

Electronic copy available at: https://ssrn.com/abstract=3753015

剩余11页未读，继续阅读

评论收藏

内容反馈

weixin_38704701

粉丝: 8
资源: 982

通过新闻情感分析实时预测股票趋势-研究论文

论文研究-基于新闻文本的上市公司财务困境组合预测模型.pdf

一种基于金融新闻的股票价格预测模型-研究论文

金融学教授：ChatGPT可通过分析财经新闻预测股价涨跌

毕业设计论文 基于CNN及LSTM预测模型-深度学习的财经新闻量化与股市预测研究 股票预测研究 共36页.pdf

AI_Stock：将根据新闻预测股票价格的AI

Daily News for Stock Market Prediction 每日新闻为了股票市场预测-数据集

一种基于股票情感分析的股市趋势预测方法

论文研究 - 集成实时大数据流情感分析服务

Stock-Market-Analysis:主要思想是对具有未来几天预测值的某只股票有一个全面的了解； 对与股票相关的新闻进行情感分析，并通过内置的比较功能提供更多可视化效果，这些功能支持强大的搜索引擎

基于关联规则的股票时间序列趋势预测研究（论文）

论文研究-基于关联分析的神经网络股票预测 .pdf

使用时间序列分析的股票市场预测-研究论文

使用机器学习技术预测高频股票价格-研究论文

基于长短期记忆 (LSTM) 算法的股票市场预测-研究论文

使用深度学习进行股票选择和预测-研究论文

关联规则算法在股票分析预测中的应用研究（论文）

论文研究-基于混沌理论的股票分析及其神经网络预测 .pdf

论文研究-投资者中签率对股票上市表现的预测分析.pdf

基于数据挖掘技术的市财政收入分析预测模型论文-毕业论文.doc

基于数据挖掘技术的市财政收入分析预测模型论文--大学毕业论文设计.doc

使用机器学习技术预测糖尿病-研究论文

基于随机森林的宏观经济变量特征选择用于股票市场预测-研究论文

SVM神经网络的回归预测分析---上证开盘指数预测_svm预测_SVM神经网络的回归预测分析_回归预测_

高频新闻情绪及其在外汇市场预测中的应用-研究论文

基于数据挖掘技术的市财政收入分析预测模型论文正文-毕业论文.doc

使用数据挖掘技术对心脏病的早期预测-研究论文

使用机器学习进行房价预测-研究论文

论文研究-基于集成预测的均值-方差-熵的模糊投资组合选择.pdf

论文研究-面向网络新闻领域的评论情感极性分析.pdf

最新资源

毕业设计论文基于CNN及LSTM预测模型-深度学习的财经新闻量化与股市预测研究股票预测研究共36页.pdf

Stock-Market-Analysis:主要思想是对具有未来几天预测值的某只股票有一个全面的了解；对与股票相关的新闻进行情感分析，并通过内置的比较功能提供更多可视化效果，这些功能支持强大的搜索引擎