![](https://csdnimg.cn/release/download_crawler_static/18884883/bg1.jpg)
Electronic copy available at: http://ssrn.com/abstract=2621156
An automation
algorithm
427
Managerial Finance
Vol. 35 No. 5, 2009
pp. 427-438
# Emerald Group Publishing Limited
0307-4358
DOI 10.1108/03074350910949790
An automation algorithm for
har vesting capital market
information from the web
Pankaj Agrrawal
Department of Finance, University of Maine, Orono, Maine, USA
Abstract
Purpose – The purpose of this paper is to develop an algorithm to harvest user specified
information on finance portals and compile it into machine-readable datasets for quantitative
analysis.
Design/methodology/approach – The Visual Basic macro language in Microsoft Excel is applied
to develop code that is not constrained by the single-query function of Excel. The core of the
algorithm is built around the splitting of the URL connector line and the placement of a continuously
updating variable into which are looped as many tickers as there are in the input list. The output is
then written to non-overlapping cells.
Findings – Numerical information placed on major finance websites can be harvested into
structured machine-readable datasets by applying this algorithm.
Research limitations/implications – One significant change in Microsoft Excel 2007 is that the
worksheet is expanded from 2
24
to 2
34
cells, or to be more sp ecific, from 256 (IV) columns 65,536
rows (2
8
2
16
) to 16,384 (XFD) 1,048,576 (2
14
2
20
). These new limits while allowing fo r a larger
number of tickers, still constrain a single worksheet to 16,384 columns. For five fields per tic ker that
translates into roughly 3,200 ticker symbols.
Practical implications – The algorithm extends user accessibility to websites that do not provide
the facility of simultaneous downloading of information on multiple stock tickers. Furthermore, the
procedure automates the downloading of multiple pieces of information (fields) and entire tables per
ticker (record).
Originality/value – An exhaustive literature search did not find any paper that discusses a multiple
ticker algorithm for web harvesting.
Keywords Information retrieval, Worldwide web, Programming and algorithm theory, Capital
markets
Paper type Technical paper
The internet has altered production, consumption, transportation, communication,
research and a whole range of other aspects of human activity. Physical proximity to the
floors of the New York or London Stock Exchanges is no longer necessary for successful
portfolio management and financial research. The speed and timeliness with which
capital market information is delivered over the internet has effectively removed any
logistical advantages that accrue to investors located in the proximity of the marketplace.
Some finance websites openly promote public access while others require premium
access, thus making the access to embedded data either costly or inefficient. To the
quantitatively inclined student of finance who wants to research a portfolio of
securities and requires aggregated data sets, the usefulness of ticker-by-ticker data on
a webpage is rather limited. Valuation metrics such as price to earnings, price to sales,
price to cash flow, sector exposures and rolled-up risk measures such as aggregate beta
or total volatility at the portfolio level, fundamental ratios and betas for exchange
The current issue and full text archive of this journal is available at
www.emeraldinsight.com/0307-4358.htm
The author wishes to thank an anonymous referee and the editor for suggesting numerous
improvements that have benefited the paper. The usual disclaimer applies.
Electronic copy available at: https://ssrn.com/abstract=2621156
评论0
最新资源