# Machine Learning Nanodegree
## Capstone Project
### Project: Stock Price Prediction
**Discalimer**: all stock prices historical data were downloaded from Yahoo Finance.
**Discalimer**: lstm.py was provided as part of the project files.
---
# Definition
## Problem Statement
As already stated in the “Problem Statement” of the Capstone project description in this area, the task will be to build a predictor which will use historical data from online sources, to try to predict future prices. The input to the ML model prediction should be only the date range, and nothing else. The predicted prices should be compared against the available prices for the same date range in the testing period.
## Metrics
The metrics used for this project will be the R^2 scores between the actual prices in the testing period, and the predicted prices by the model in the same period.
There are also another set of metrics that could be used, that are indicative, which is the percent difference in absolute values between real prices and predicted ones. However, for machine learning purposes (training and testing), R^2 scores would be more reliable measures.
---
# Analysis
## Data Exploration
First, let's explore the data .. Downloading stock prices for Google.
For that purpose, I have built a special class called StockRegressor, that has the ability to download and store the data in a Pandas DataFrame.
First step, is to import the class.
```python
%matplotlib inline
import numpy as np
np.random.seed(0)
import time
import datetime
from calendar import monthrange
import pandas as pd
from IPython.display import display
from IPython.display import clear_output
from statsmodels.tsa.arima_model import ARIMA
from sklearn.metrics import mean_squared_error
import warnings
warnings.filterwarnings('ignore')
from StockRegressor import StockRegressor
from StockRegressor import StockGridSearch
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (15,8)
# initializing numpy seed so that we get reproduciable results, especially with Keras
```
### The First StockRegressor Object
Getting our first historical price data batch ...
After download the prices from the Yahoo Finance web services, the below StockRegressor instance will save the historical prices into the pricing_info DataFrame. As a first step of processing, we have changed the index of the DataFrame from 'dates' to 'timeline' which is an integer index.
The reason is that it is easier for processing, since the dates correspond to trading dates, and are not sequential: they do not include weekends or holidays, as seen by the gap below between 02 Sep 2016 and 06 Sep 2016, which must have corresponded to a long weekend (Labor Day?).
> **Note:** Please note that there might be a bug in the Pandas library, that is causing an intermitten error with the Yahoo Finance web call. The bug could be traced to the file in /anaconda/envs/**your_environment**/lib/python3.5/site-packages/pandas/core/indexes/datetimes.py, at line 1050:
This line is causing the error: "if this.freq is None:". Another if condition should be inserted before that, to test for the "freq" attribute, such as: "if hasattr(this, 'freq'):"
> **Note:** The fixed datetimes.py file is included with the submission
```python
stock = StockRegressor('GOOG', dates= ['2014-10-01', '2016-04-30'])
display(stock.pricing_info[484:488])
```
Getting pricing information for GOOG for the period 2014-10-01 to 2016-09-27
Found a pricing file with wide range of dates, reading ... Stock-GOOG-1995-12-27-2017-09-05.csv
<div>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>Open</th>
<th>High</th>
<th>Low</th>
<th>Close</th>
<th>Adj Close</th>
<th>Volume</th>
<th>dates</th>
<th>timeline</th>
</tr>
<tr>
<th>timeline</th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<th>484</th>
<td>769.250000</td>
<td>771.020020</td>
<td>764.299988</td>
<td>768.780029</td>
<td>768.780029</td>
<td>925100</td>
<td>2016-09-01</td>
<td>484</td>
</tr>
<tr>
<th>485</th>
<td>773.010010</td>
<td>773.919983</td>
<td>768.409973</td>
<td>771.460022</td>
<td>771.460022</td>
<td>1072700</td>
<td>2016-09-02</td>
<td>485</td>
</tr>
<tr>
<th>486</th>
<td>773.450012</td>
<td>782.000000</td>
<td>771.000000</td>
<td>780.080017</td>
<td>780.080017</td>
<td>1442800</td>
<td>2016-09-06</td>
<td>486</td>
</tr>
<tr>
<th>487</th>
<td>780.000000</td>
<td>782.729980</td>
<td>776.200012</td>
<td>780.349976</td>
<td>780.349976</td>
<td>893700</td>
<td>2016-09-07</td>
<td>487</td>
</tr>
</tbody>
</table>
</div>
```python
stock.adj_close_price['dates'].iloc[stock.testing_end_date]
```
Timestamp('2016-07-13 00:00:00')
### The Impact of the 'Volume' Feature
The next step would be to eliminate all the columns that are not needed. The columns 'Open', 'High', 'Low', 'Close' will all be discarded, because we will be working with the 'Adj Close' prices only.
For 'Volume', let's explore the relevance below.
From the below table and graph, we conclude that Volume has very little correlation with prices, and so we will drop it from discussion from now on.
There might be evidence that shows that there is some correlation between spikes in Volume and abrupt changes in prices. That might be logical since higher trading volumes might lead to higher prices fluctuations. However, these spikes in volume happen on the same day of the changes in prices, and so have little predictive power. This might be a topic for future exploration.
---
```python
from sklearn.preprocessing import MinMaxScaler
scaler_volume = MinMaxScaler(copy=True, feature_range=(0, 1))
scaler_price = MinMaxScaler(copy=True, feature_range=(0, 1))
prices = stock.pricing_info.copy()
prices = prices.drop(labels=['Open', 'High', 'Low', 'Close', 'dates', 'timeline'], axis=1)
scaler_volume.fit(prices['Volume'].reshape(-1, 1))
scaler_price.fit(prices['Adj Close'].reshape(-1, 1))
prices['Volume'] = scaler_volume.transform(prices['Volume'].reshape(-1, 1))
prices['Adj Close'] = scaler_price.transform(prices['Adj Close'].reshape(-1, 1))
print("\nCorrelation between Volume and Prices:")
display(prices.corr())
prices.plot(kind='scatter', x='Adj Close', y='Volume')
```
Correlation between Volume and Prices:
<div>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>Adj Close</th>
<th>Volume</th>
</tr>
</thead>
<tbody>
<tr>
<th>Adj Close</th>
<td>1.00000</td>
<td>-0.06493</td>
</tr>
<tr>
<th>Volume</th>
<td>-0.06493</td>
<td>1.00000</td>
</tr>
</tbody>
</table>
</div>
<matplotlib.axes._subplots.AxesSubplot at 0x1100e5ac8>
![png](imgs/output_12_3.png)
## Exploratory Visualization
Now let's explore the historical pricing .. For that purpose, we have built two special purpose functions into the StockRegressor class.
The first plotting function will show the "learning_df" DataFrame. This is the dataframe that will be used to store all "workspace" data, i.e. dates, indexes, prices, predictions of multiple algorithms.
The second plotting function which will be less frequently used is a function that plots prices with the Bollinger bands. This is for pricing exploration only.
Below, we call those two functions. As we haven't trained the StockRegressor, the plot_learning_data_frame() function will show the learning_df dataframe with only the pricing, and a vertical r
没有合适的资源?快使用搜索试试~ 我知道了~
matlab资源 基于快速傅立叶变换回归的股价预测-机器学习纳米级顶点项目 仅供学习参考用代码.zip
共110个文件
csv:83个
png:21个
py:2个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 157 浏览量
2023-10-18
15:32:17
上传
评论
收藏 4.25MB ZIP 举报
温馨提示
matlab资源 基于快速傅立叶变换回归的股价预测-机器学习纳米级顶点项目 仅供学习参考用代码
资源推荐
资源详情
资源评论
收起资源包目录
matlab资源 基于快速傅立叶变换回归的股价预测-机器学习纳米级顶点项目 仅供学习参考用代码.zip (110个子文件)
Stock-AAPL-1995-12-27-2017-09-05.csv 408KB
Stock-BMY-2000-01-01-2017-08-27.csv 316KB
Stock-GOOG-1995-12-27-2017-09-05.csv 266KB
Stock-GLD-1995-12-27-2017-09-05.csv 234KB
Stock-GLD-2010-01-01-2014-12-16.csv 95KB
Stock-GLD-2010-01-01-2014-12-05.csv 94KB
Stock-GLD-2010-01-01-2014-08-27.csv 89KB
Stock-GLD-2012-01-01-2015-05-07.csv 64KB
Stock-GOOG-2011-01-01-2014-01-15.csv 62KB
Stock-AAPL-2011-01-01-2014-01-15.csv 59KB
Stock-GLD-2011-01-01-2014-01-15.csv 59KB
Stock-IBM-2011-01-01-2014-01-15.csv 59KB
Stock-XOM-2011-01-01-2014-01-15.csv 54KB
Stock-GLD-2012-01-01-2014-08-27.csv 51KB
Stock-GLD-2013-01-01-2015-09-06.csv 50KB
Stock-GLD-2015-01-01-2017-08-27.csv 49KB
Stock-GLD-2013-01-01-2015-08-07.csv 49KB
Stock-BMY-2007-05-31-2010-02-04.csv 47KB
Stock-BMY-2007-03-02-2009-11-06.csv 47KB
Stock-BMY-2007-06-30-2010-03-06.csv 47KB
Stock-BMY-2007-05-01-2010-01-05.csv 47KB
Stock-BMY-2007-04-01-2009-12-06.csv 47KB
Stock-GLD-2005-03-02-2007-11-07.csv 47KB
Stock-GLD-2005-01-01-2007-09-08.csv 47KB
sp500.csv 47KB
Stock-GLD-2005-01-31-2007-10-08.csv 47KB
Stock-BMY-2007-01-31-2009-10-07.csv 47KB
Stock-GLD-2005-04-01-2007-12-07.csv 47KB
Stock-BMY-2007-01-01-2009-09-07.csv 47KB
Stock-GLD-2005-05-01-2008-01-06.csv 47KB
Stock-GLD-2005-05-31-2008-02-05.csv 47KB
Stock-GLD-2005-06-30-2008-03-06.csv 47KB
Stock-GLD-2005-07-30-2008-04-05.csv 46KB
Stock-GLD-2005-09-28-2008-06-04.csv 46KB
Stock-GLD-2005-10-28-2008-07-04.csv 46KB
Stock-GLD-2005-08-29-2008-05-05.csv 46KB
Stock-GLD-2006-01-26-2008-10-02.csv 46KB
Stock-GLD-2005-11-27-2008-08-03.csv 46KB
Stock-GLD-2006-02-25-2008-11-01.csv 46KB
Stock-GLD-2005-12-27-2008-09-02.csv 46KB
Stock-GLD-2006-04-26-2008-12-31.csv 46KB
Stock-GLD-2006-03-27-2008-12-01.csv 46KB
Stock-GLD-2006-05-26-2009-01-30.csv 46KB
Stock-GLD-2006-06-25-2009-03-01.csv 46KB
Stock-GLD-2013-01-01-2015-01-05.csv 38KB
Stock-AAPL-2012-07-24-2014-03-31.csv 33KB
Stock-AAPL-2012-06-24-2014-03-01.csv 33KB
Stock-AAPL-2012-05-25-2014-01-30.csv 33KB
Stock-AAPL-2013-01-20-2014-09-27.csv 32KB
Stock-AAPL-2012-08-23-2014-04-30.csv 32KB
Stock-AAPL-2012-09-22-2014-05-30.csv 32KB
Stock-AAPL-2013-02-19-2014-10-27.csv 32KB
Stock-AAPL-2012-12-21-2014-08-28.csv 32KB
Stock-AAPL-2012-11-21-2014-07-29.csv 32KB
Stock-AAPL-2012-10-22-2014-06-29.csv 32KB
Stock-AAPL-2013-03-21-2014-11-26.csv 32KB
Stock-AAPL-2014-04-15-2015-12-21.csv 32KB
Stock-AAPL-2014-03-16-2015-11-21.csv 32KB
Stock-AAPL-2013-04-20-2014-12-26.csv 32KB
Stock-AAPL-2014-02-14-2015-10-22.csv 32KB
Stock-AAPL-2014-01-15-2015-09-22.csv 32KB
Stock-AAPL-2014-06-14-2016-02-19.csv 32KB
Stock-AAPL-2014-05-15-2016-01-20.csv 32KB
Stock-AAPL-2013-12-16-2015-08-23.csv 32KB
Stock-AAPL-2013-05-20-2015-01-25.csv 32KB
Stock-AAPL-2013-10-17-2015-06-24.csv 32KB
Stock-AAPL-2013-06-19-2015-02-24.csv 32KB
Stock-AAPL-2013-11-16-2015-07-24.csv 32KB
Stock-AAPL-2013-07-19-2015-03-26.csv 32KB
Stock-AAPL-2013-09-17-2015-05-25.csv 32KB
Stock-AAPL-2013-08-18-2015-04-25.csv 31KB
Stock-GLD-2008-04-20-2009-12-26.csv 29KB
Stock-GLD-2008-03-21-2009-11-26.csv 29KB
Stock-GLD-2008-06-19-2010-02-24.csv 29KB
Stock-GLD-2008-02-20-2009-10-27.csv 29KB
Stock-GLD-2008-05-20-2010-01-25.csv 29KB
Stock-GLD-2008-01-21-2009-09-27.csv 29KB
Stock-GLD-2007-12-22-2009-08-28.csv 29KB
Stock-GLD-2007-10-23-2009-06-29.csv 29KB
Stock-GLD-2007-11-22-2009-07-29.csv 29KB
Stock-GLD-2007-08-24-2009-04-30.csv 29KB
Stock-GLD-2007-09-23-2009-05-30.csv 29KB
Stock-GLD-2007-07-25-2009-03-31.csv 29KB
StockRegressor.ipynb 1.51MB
StockRegressor User Interface.ipynb 670KB
README.md 69KB
StockRegressor User Interface.md 32KB
output_14_11.png 96KB
output_51_3.png 95KB
output_14_17.png 93KB
output_14_29.png 91KB
output_45_1.png 90KB
output_14_23.png 86KB
output_14_1.png 86KB
output_49_3.png 86KB
output_43_1.png 80KB
output_14_5.png 77KB
output_39_1.png 74KB
output_35_2.png 73KB
output_41_2.png 69KB
共 110 条
- 1
- 2
资源评论
极客11
- 粉丝: 354
- 资源: 1391
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功