【免费】一种从Web收集资本市场信息的自动化算法-研究论文资源-CSDN文库

需积分: 0 151 浏览量 2021-05-19 21:11:45 上传评论收藏 823KB PDF 举报

资源详情

资源评论

Electronic copy available at: http://ssrn.com/abstract=2621156

An automation

algorithm

427

Managerial Finance

Vol. 35 No. 5, 2009

pp. 427-438

# Emerald Group Publishing Limited

0307-4358

DOI 10.1108/03074350910949790

An automation algorithm for

har vesting capital market

information from the web

Pankaj Agrrawal

Department of Finance, University of Maine, Orono, Maine, USA

Abstract

Purpose – The purpose of this paper is to develop an algorithm to harvest user specified

information on finance portals and compile it into machine-readable datasets for quantitative

analysis.

Design/methodology/approach – The Visual Basic macro language in Microsoft Excel is applied

to develop code that is not constrained by the single-query function of Excel. The core of the

algorithm is built around the splitting of the URL connector line and the placement of a continuously

updating variable into which are looped as many tickers as there are in the input list. The output is

then written to non-overlapping cells.

Findings – Numerical information placed on major finance websites can be harvested into

structured machine-readable datasets by applying this algorithm.

Research limitations/implications – One significant change in Microsoft Excel 2007 is that the

worksheet is expanded from 2

to 2

cells, or to be more sp ecific, from 256 (IV) columns  65,536

rows (2

 2

) to 16,384 (XFD)  1,048,576 (2

 2

). These new limits while allowing fo r a larger

number of tickers, still constrain a single worksheet to 16,384 columns. For five fields per tic ker that

translates into roughly 3,200 ticker symbols.

Practical implications – The algorithm extends user accessibility to websites that do not provide

the facility of simultaneous downloading of information on multiple stock tickers. Furthermore, the

procedure automates the downloading of multiple pieces of information (fields) and entire tables per

ticker (record).

Originality/value – An exhaustive literature search did not find any paper that discusses a multiple

ticker algorithm for web harvesting.

Keywords Information retrieval, Worldwide web, Programming and algorithm theory, Capital

markets

Paper type Technical paper

The internet has altered production, consumption, transportation, communication,

research and a whole range of other aspects of human activity. Physical proximity to the

floors of the New York or London Stock Exchanges is no longer necessary for successful

portfolio management and financial research. The speed and timeliness with which

capital market information is delivered over the internet has effectively removed any

logistical advantages that accrue to investors located in the proximity of the marketplace.

Some finance websites openly promote public access while others require premium

access, thus making the access to embedded data either costly or inefficient. To the

quantitatively inclined student of finance who wants to research a portfolio of

securities and requires aggregated data sets, the usefulness of ticker-by-ticker data on

a webpage is rather limited. Valuation metrics such as price to earnings, price to sales,

price to cash flow, sector exposures and rolled-up risk measures such as aggregate beta

or total volatility at the portfolio level, fundamental ratios and betas for exchange

The current issue and full text archive of this journal is available at

www.emeraldinsight.com/0307-4358.htm

The author wishes to thank an anonymous referee and the editor for suggesting numerous

improvements that have benefited the paper. The usual disclaimer applies.

Electronic copy available at: https://ssrn.com/abstract=2621156

Electronic copy available at: http://ssrn.com/abstract=2621156

35,5

428

traded funds (Agrrawal and Clark, 2007) or individual securities are necessary for

learning about and effectively managing investment portfolios.

The paper provides an algorithm and a procedure that uses readily available

resources to harvest web information[1] and to compile it into datasets for financial

analysis. Specifically, the single ticker web-query feature in Microsoft Excel 2007 (and

earlier versions) is extended to read multiple ticker symbols from an input list, to extract

only a targeted subset of web information, and to arrange it in a predictable pattern on a

worksheet. The goal is to quickly harvest data from a web page for many firms. This is

done using Excel 2007s Visual Basic for Applications (VBA) macro programming

language. Apart from set-up time, there is virtually no direct cost involved.

Literature

Financial data harvested from the internet is beginning to be referenced in peer-

reviewed journals. A study of intra-day price formation in US equity markets by

Hasbrouck (2003), published in the Journal of Finance, uses data from the Chicago

Mercantile Exchange website to extract the volume-price files [www.cme.com/trading/

dta/hist/]. The Chicago Board of Options Exchange (CBOE) website [http://cboe.com/

micro/vix/] provides option-implied volatilities for the S&P 100 and NASDAQ 100

indices. Corrado and Miller (2005), utilized this data for their study published in the

Journal of Futures Markets. The open, high, low and close data for the S&P 500 index

obtained from Yahoo! Finance [http://finance.yahoo.com/], was used by Cor rado and

Miller (2006) for their study on estimating expected excess returns using historical and

implied versions of volatility. The study was published in the Journal of Financial

Research. Sites such as those of the Federal Reserve Bank in St. Louis [http://

research.stlouisfed.org/fred2/] or the Federal Reserve Board in Washington, DC.

[www.federalreserve.gov/datadownload/] have been widely used for quite some time

by economists for downloading mac ro-economic data. Waggle and Johnson (2004)

deployed the data from the Federal Reserve Board to estimate 30-year, conventional

mor tgage interest rates in their study published in the Journal of Real Estate Portfolio

Management. Woodroof (1999) shows how to download stock prices into a Microsoft

Excel spreadsheet while Rose and Rose (2001) show how to track market activity using

Excel spreadsheets. Benni nga (2008) discusses the how to create a VBA web-query;

however, each run of the macro is limited to collecting data fo r a single firm. The macro

developed here allows each run of the macro to collect data for multiple firms, without

the manual entr y of successive tickers in a windows dialog box.

The type of data that can be transferred from the web to a local computer using

Excel can be broadly classified as one of the following four types:

(1) Data type 1: Web sites that have data in an Excel format. The user simply

downloads the Excel file. An example is the web site: http://cboe.com/micro/

vix/historical.aspx. The page has a link to an Excel file with historic prices. The

user simply downloads the file.

(2) Data type 2: The web page has links that ‘‘automatically’’ transfer the data to an

Excel worksheet. Examples are historical prices at Yahoo! Finance and

Moneycentral.com. The user selects the firm, frequency of price data (daily,

weekly or monthly), and period for the data; the data appears on the screen;

there is a link that transfers the data to a local Excel worksheet.

Electronic copy available at: https://ssrn.com/abstract=2621156

An automation

algorithm

429

(3) Data type 3: The web page does not have links that ‘‘automatically’’ transfers

the data to an Excel workshe et. However, the data is in tables and Excel’s Web

Query feature can be used to bring in the data.

(4) Data type 4: The web page does not have links that ‘‘automatically’’ transfer the

data to an Excel worksheet and it is not in the form of tables. Some data at the

St Louis Fed is in this fo rmat. The Web Query function can be used but the data,

when it appears on an Excel worksheet, is usually not in the desired for mat.

Then, the user need s to use Excel’s Text to Columns fe ature (Data tab,

Data Tools group, Text to Columns icon).

The purpose is to describe a way to transfer data of Data type 3. Specifically, to

automate the process using different ticker symbols for different firms and thus

harvest large amounts of machine-readable data from the web to a local user.

The algorithm described has a generalized framework that applies Visual Basic in

Excel to develop a downloading solution that is not constrained by the single-query

function of Excel. It extends user accessibility to websites that do not provide the

facility of simultaneous downloading of information on multiple stock tickers[2].

Furthermore, the procedure created and described here automates the downloading of

multiple pieces of information (field s) and entire tables per ticker (reco rd). This

algorithm can be put to use by students and faculty in an academic setting who, for

various reasons, do not have professional information delivery platforms that can often

be prohibitively expensive, besides being overkill for instructional and learning

purposes.

The two-par t harvesting process

In this section the single ticker web-query functionality is extended to reading multiple

tickers from an input list and thereafter extracting only a subset of the information on a

webpage. That information is then transferred and arranged in a predictable pattern

on a local Excel sheet, and an automated repetition of the process results in the

generation of a machine-readable dataset of large dimensions. Creating the one item

‘‘web query’’ and subsequently automating it to apply the modified query to an input

list are the two parts that comprise this procedure. For illustration purposes, we will

access a premier provider of processed financial analytics and data, Morningstar.Com,

to download fundamental valuation and sector exposure information on ETFs into an

Excel spreadsheet.

Setting up a one ticker web query

Start by opening a new Excel 2007 workbook and naming it web-query-ETF-ratios-

2008.xlsx. After creating the macro the file name extension needs to be changed to

.xlxm, reflecting the fact that a macro is resident within the file. Excel has a feature to

transfer numerical as well as non-numerical data from the web[3]. As shown in

Figure 1, this option is on the ‘‘Data’’ ribbon, in the ‘‘Get External Data’’ group and is

‘‘From Web’’.

Clicking on ‘‘From Web’’ the button brings up the ‘‘New Web Query’’ dialog box,

which is utilized to set up the web-retrieval for a single ticker. The target web address

(URL) is entered into Box 1 of the ‘‘New Web Query’’ dialog box. This can be

accomplished by opening up a browser, loadi ng the target web site, copying and then

pasting the URL into the ‘‘Address’’ bar of the ‘‘New Web Query’’ dialog box (see

Electronic copy available at: https://ssrn.com/abstract=2621156

剩余11页未读，继续阅读

评论收藏

内容反馈

一种从Web收集资本市场信息的自动化算法-研究论文

评论0

最新资源

一种从Web收集资本市场信息的自动化算法-研究论文

评论0

最新资源

相关推荐

未来资本市场中的算法-研究论文

IPO定价：信息效率低下和资本市场配置不当-研究论文

论文研究-上市公司临时报告对资本市场信息传递的影响.pdf

论文研究-Web页面自动化设计中布局挖掘和样式匹配算法.pdf

论文研究-基于模板的Web信息自动提取方法.pdf

论文研究-模型驱动的自动化测试架构.pdf

数据挖掘在各行业的应用论文

基于混沌遗传算法的Web服务组合

基于代理的网络空间语义匹配算法的模糊评估-研究论文

数据挖掘论文合集-242篇（part1）

数据挖掘论文合集-242篇（part3）

数据挖掘论文合集-242篇（part2）

WSMO-Lite和HRESTS：Web服务和Restful API的轻量级语义注释-研究论文

论文研究-基于翻译模式的BPEL到LOTOS映射方法研究.pdf

货币计数假钞检测-研究论文

从数据网络到行动世界-研究论文

基于语义的林产品贸易Web信息抽取算法 (2014年)

数字图像处理车牌定位开题报告

基于ASP网上考试系统的设计与实现论文（自己当年的毕业设计）

appengine-geneticalgorithm-framework:从 code.google.compappengine-geneticalgorithm-framework 自动导出

java源码包---java 源码 大量 实例

JAVA上百实例源码以及开源项目

Origin绘制相关性热图插件(Correlation Plot)

2024春 四川农业大学 数字电子技术 期末机考试卷答案

（免费）Chrome浏览器插件axure-chrome-extension

vep视频快速加密提取器

糖尿病数据集diabetes.csv（免费）

最新版YS9082HC主控开卡工具 YS9082HC-MPToolV8.00.00.18.826-HCS1A25E2023062

2011-2022年北大数字普惠金融指数数据（包括省市县）.zip

java源码包---java 源码大量实例

2024春四川农业大学数字电子技术期末机考试卷答案