# Deep learning project - Time Series Data Prediction (Matlab, LSTM)
Writer : Harim Kang
# Organized Blog
'Read Me' posting in Korean is located at the address below.
[https://davinci-ai.tistory.com/11](https://davinci-ai.tistory.com/11)
# Project subject
The first topic was to analyze online product price data to predict current product prices.
However, due to the insufficient time and limited computing power compared to the vast amount of data, we modified the project to analyze the online price data of jeans and predict the jeans price in the near future.
The reason why I chose jeans out of many items is that I have more data than other items and I can wear them all season. (Cell phones and TV items were selected as jeans because of lack of historical data.)
For more items, I'll try to find a way later.
## Usage data
[https://www.data.go.kr/dataset/15004449/fileData.do](https://www.data.go.kr/dataset/15004449/fileData.do)
Online collection price information data consists of 8 items including price information collected online, collection date, item name and sales price.
![./images/Untitled.png](./images/Untitled.png)
The data collection period is available from January 2014 to October 2019.
The data I used in the analysis used data from January 2015 to October 2019.
## Data analysis process
1. Data Purification
2. Explore the data
3. Purify additional data for analysis
4. Model Selection for Prediction
5. Data prediction
6. Predictive Assessment (RMSE)
7. Semantic Analysis
## Data refining
Due to the lack of computing power (my laptop) compared to the huge amount of data (about 100 million data), I decided to use the average daily selling price per day.
![./images/Untitled%201.png](./images/Untitled%201.png)
![./images/Untitled%202.png](./images/Untitled%202.png)
To explain the code, we retrieved the data by date and extracted only the collection date, item name, and selling price from the eight items. The code that finds and averages only the data whose item name is jeans and writes it to the new data.
It's been a while since I'm working with a lot more data than I thought. It took more than 30 minutes per year of data, and a good cpu computer seems to be faster.
Using the code above, we obtained a refined data set by calculating the average of the daily jeans sale price for each day.
![./images/Untitled%203.png](./images/Untitled%203.png)
## Explore data
![./images/Untitled%204.png](./images/Untitled%204.png)
This is a graph of the average price of jeans made for own use.
The data summary is shown below.
![./images/Untitled%205.png](./images/Untitled%205.png)
- Item: Jeans
- Period: January 01, 2015 ~ October 31, 2019 (total 1765 days)
- Minimum value: 32732 won
- Average value: 51998 won
- Maximum value: 166220 won
![./images/Untitled%206.png](./images/Untitled%206.png)
The graph above is a graph overlaid by year. It doesn't look very good.
## Purify additional data for analysis
Take the sales price average data for the newly created jeans and fill in the blanks for better learning results.
% Jeans mean data refinement
jean_data = readtable('mean\tt.xlsx');
% Fill the NaN value with the Nearest value.
jean_data.sales_price = fillmissing(jean_data.sales_price, 'nearest');
lenofdata = length(jean_data.sales_price);
for i=1 : length(jean_data.collect_day)
jean_data.collect_day(i) = strip(jean_data.collect_day(i),"'");
end
Y = jean_data.sales_price;
data = Y';
matlab handles white space in data with a function called fillmissing. At this time, we put the nearest value 'nearest' as a parameter because we thought the data had a trend. The code below uses the strip function to delete special characters in string characters.
% 2015.01.01 ~ 2019.05.06 (90%) : Training Data Set
% 2019.05.07 ~ 2019.10.31 (10%) : Test Data Set
numTimeStepsTrain = floor(0.9*numel(data));
dataTrain = data(1:numTimeStepsTrain+1);
dataTest = data(numTimeStepsTrain+1:end);
In order to train the model well and evaluate it, we distinguished between training and test datasets. Since the amount of data is not very large, I divided the ratio by 9: 1 to get better results.
% Normalize sales_price to a value between 0 and 1 (Training Data Set)
mu = mean(dataTrain);
sig = std(dataTrain);
dataTrainStandardized = (dataTrain - mu) / sig;
XTrain = dataTrainStandardized(1:end-1);
YTrain = dataTrainStandardized(2:end);
Also, the selling price is a very large number. (10,000 units) So I did a normalization to learn more. We are going to train after changing the number to a relative value between 0 and 1.
## Model Selection for Prediction
I chose a LSTM (Long Short-Term Memory models) model and applied it to the data. My data is organized by date in daily order. This is called 'Time Series' data, which is one of the Sequence data types. Sequence type data uses a deep learning model called Recurrent Neural Networks (RNN).
![./images/Untitled%207.png](./images/Untitled%207.png)
RNN
However, the data that you want to use has to play an important role. In the case of the existing RNNs, the farther you are from the earlier data, the more oblivious the forgetting becomes.
![./images/Untitled%208.png](./images/Untitled%208.png)
The LSTM adds input gates and output gates to memory cells in the hidden layer to clear out unnecessary memory and determine what to remember. That's why LSTM is more suitable for Time Series than RNN.
![./images/Untitled%209.png](./images/Untitled%209.png)
Detailed algorithm descriptions will be further summarized as you study Deep Learning.
In Matlab, set the LSTM option with the following code: This is the code that increased MaxEpochs to 500 in the existing Matlab LSTM tutorial.
%LSTM Net Architecture Def
numFeatures = 1;
numResponses = 1;
numHiddenUnits = 200;
layers = [ ...
sequenceInputLayer(numFeatures)
lstmLayer(numHiddenUnits)
fullyConnectedLayer(numResponses)
regressionLayer];
options = trainingOptions('adam', ...
'MaxEpochs',500, ...
'GradientThreshold',1, ...
'InitialLearnRate',0.005, ...
'LearnRateSchedule','piecewise', ...
'LearnRateDropPeriod',125, ...
'LearnRateDropFactor',0.2, ...
'Verbose',0, ...
'Plots','training-progress');
You can train the network using the above options as shown below. The Matlab code is like the code below.
% Train LSTM Net
net = trainNetwork(XTrain,YTrain,layers,options);
Running the above code will train the model as shown below. Iteratively evaluates itself within the training dataset.
![./images/Untitled%2010.png](./images/Untitled%2010.png)
LSTM Net Architecture Model Training Progress (Epoch: 250)
## Data prediction
Training data was normalized before training. Normalize test data. Then start predicting against the test data set.
% Normalize sales_price to a value between 0 and 1 (Testing Data Set)
dataTestStandardized = (dataTest - mu) / sig;
XTest = dataTestStandardized(1:end-1);
net = predictAndUpdateState(net,XTrain);
[net,YPred] = predictAndUpdateState(net,YTrain(end));
% Predict as long as the test period (2019.05.07 ~ 2019.10.31)
numTimeStepsTest = numel(XTest);
for i = 2:numTimeStepsTest
[net,YPred(:,i)] = predictAndUpdateState(net,YPred(:,i-1),'ExecutionEnvironment','cpu');
end
The code that executes the for statement as much as the test data, puts the predicted value in YPred, and updates and initializes the net.
## Predictive evaluation (RMSE)
YPred contains the predicted value and YTest contains the actual correct answer. To assess the performance of the model, we use the MSE or RMSE to check the error of the data. I will use RMSE here. (I'll add a description later.)
% RMSE calculation of test data set
YTest = dataTest(2:end);
YTest = (YTest - mu) / sig;
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
利用matlab对时间序列数据(牛仔裤销售数据集)进行LSTM预测_matlab实现(包含完整源码+数据集+项目说明) 【项目主体】 第一个主题是分析在线产品价格数据,预测当前产品价格。 但由于数据量大,时间不够,计算能力有限,我们修改了项目,对牛仔裤的网上价格数据进行分析,预测牛仔裤近期的价格。 【数据分析与处理】 数据净化 研究数据 为分析提纯额外的数据 预测模型选择 数据预测 预测评估(RMSE) 语义分析 【析提纯额外的数据】 用新生产的牛仔裤的销售价格平均数据填空,以获得更好的学习效果。 Matlab用一个叫做fillmissing的函数来处理数据中的空白。 【预测模型选择】 我选择了一个LSTM (Long - Short-Term Memory models)模型,并将其应用到数据中。资料是按日期按日排列的。 这被称为“时间序列”数据,它是序列数据类型的一种。序列类型数据使用一种称为循环神经网络(RNN)的深度学习模型。
资源推荐
资源详情
资源评论
收起资源包目录
利用matlab对时间序列数据(牛仔裤销售数据集)进行LSTM预测_matlab实现(包含完整源码+数据集+项目说明).zip (43个子文件)
images
2019.jpg 26KB
Untitled 9.png 127KB
500ep_2.jpg 129KB
Untitled 4.png 117KB
pred2.fig 50KB
Untitled 1.png 447KB
prediction1.fig 45KB
Untitled 6.png 128KB
Untitled.png 33KB
mix.jpg 42KB
2016.jpg 22KB
500epoch.jpg 35KB
prediction1.jpg 88KB
500epoch2.jpg 35KB
Untitled 2.png 211KB
2017.jpg 27KB
pred_error.jpg 133KB
500ep_1.jpg 88KB
Untitled 7.png 29KB
pred2.jpg 90KB
500epoch_error.jpg 51KB
Untitled 5.png 4KB
Untitled 3.png 68KB
Untitled 8.png 85KB
current.fig 44KB
Untitled 12.png 168KB
Untitled 14.png 118KB
2015.jpg 31KB
Untitled 10.png 88KB
Untitled 13.png 147KB
README.md 19B
001.PNG 141KB
Untitled 11.png 141KB
pred_error.fig 64KB
2018.jpg 26KB
current.jpg 33KB
extract_jean_sale_average.m 2KB
项目说明.txt 3KB
jean_sales.xlsx 56KB
Documents
LSTM Prediction for time series data.pdf 1.24MB
LSTM Prediction for time series data.pptx 1.81MB
timeseries_LSTM_prediction.m 3KB
README.md 10KB
共 43 条
- 1
资源评论
Make程序设计
- 粉丝: 5622
- 资源: 3567
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功