# Anomaly Detection for Temporal Data using LSTM.
This repository contains the code used in my master thesis on LSTM based anomaly detection for time series data. The thesis report can be downloaded from [here](http://www.diva-portal.org/smash/record.jsf?pid=diva2:1149130).
**Abstract**
We explore the use of Long short-term memory (LSTM) for anomaly detection in temporal data. Due to the challenges in obtaining labeled anomaly datasets, an unsupervised approach is employed. We train recurrent neural networks (RNNs) with LSTM units to learn the normal time series patterns and predict future values. The resulting prediction errors are modeled to give anomaly scores. We investigate different ways of maintaining LSTM state, and the effect of using a fixed number of time steps on LSTM prediction and detection performance. LSTMs are also compared to feed-forward neural networks with fixed size time windows over inputs. Our experiments, with three real-world datasets, show that while LSTM RNNs are suitable for general purpose time series modeling and anomaly detection, maintaining LSTM state is crucial for getting desired results. Moreover, LSTMs may not be required at all for simple time series.
## Requirements
1. Keras 2.0.3
2. TensorFlow 1.0.0
3. sickit-learn 0.18.2
4. GPyOpt 1.0.3. (only required for hyper-parameter tuning)
## Configuration:
First set the configuration settings in configuration/config.py. This file has different configuration settings.
1. Use `run_config` to set parameters for the program execution
like data folder, log_file etc.
* Xserver: denotes if the machine has a display environment. Set it to false when
running on remote machines with no display. Else the plotting comamnds will result in an error.
* experiment_id: a id used to identify different runs for example in the logs. The result plots are saved in folder: imgs/<experiment_id>.
2. Use `opt_congfig` to set parameters for optimization runs. Refer:
[1](https://github.com/SheffieldML/GPyOpt) ,[2](http://pythonhosted.org/GPyOpt/)
3. `multi_step_lstm_config`: contains parameters specific to LSTM network
For hyper-parameters used refer to table 4.1 in thesis reports.
## How to Execute:
1. **Data Pre-processing**: The LSTM network needs data formatted such that each input
sample has *look_back* number of data points and each output sample has *look_ahead* number
of time-steps. To convert the data into appropriate format and create train, test, and validation datasets python notebooks have been used.
We provide notebooks for the three datasets used in the thesis which can be used as examples for new datasets.
The three notebooks along with the processed dataset files are:
1. ECG: notebooks/discords_ECG.ipynb, resources/data/discords/ECG/
2. power_consumption: notebooks/discords_power_consumption.ipynb, resources/data/discords/dutch_power/
3. machine_temperature: notebooks/NAB_machine_temp.ipynb, resources/data/nab/nab_machine_temperature/
This step is done in "Part 1" of corresponding notebook.
2. **Prediction Modeling**: The main LSTM models used are in the file models/lstm.py. For training the model and generating predictions two main files
are provided:
1. *lstm_predictor.py*: This file uses the default LSTM implementation by keras.
2. *stateful_lstm_predictor.py*: uses the stateful LSTM implementation
Once the configuration setting `data_folder` has been set correctly, the code will look for train, test, and validation sets in those files.
3. **Anomaly Detection**:
Running the LSTM models which generate the predictions for train, test, and validation sets. For anomaly detection we need to calculate prediction errors or residuals,
model them using Gaussian distribution and then set thresholds. This is done in "Part 3" of the corresponding notebook files.
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
基于lstm进行异常检测.zip (57个子文件)
lstm_anomaly_thesis-master
gp_anomaly.py 4KB
lstm_predictor.py 10KB
NN.py 5KB
dbn_1svm.py 4KB
resources
data
nab
nab_machine_temperature
NAB_machine_temperature_system_failure.csv 715KB
nab_cpu_utilization_cc0c53
rds_cpu_utilization_cc0c53.csv 115KB
discords
ECG
mitdbx_mitdbx_108.txt 1.01MB
dutch_power
power_data.txt 155KB
space_shuttle
TEK17.txt 83KB
TEK16.txt 83KB
TEK14.txt 83KB
ECG_data
ltstdb_20321_240.txt 88KB
chfdb_chf01_275.txt 88KB
ltstdb_20221_43.txt 88KB
stdb_308_0.txt 127KB
mitdb__100_180.txt 127KB
xmitdb_x108_0.txt 127KB
chfdb_chf13_45590.txt 95KB
results
machine_temp.txt 22KB
powe_demand.txt 7KB
NN.txt 1KB
power_demand_stateless 8KB
ECG.txt 34KB
configuration
__init__.py 0B
server_config.py 2KB
config.py 2KB
optimization
bayes_opt_results.txt 18KB
config.json 626B
spear_optimize.py 2KB
bayes_opt.py 10KB
utilities
utils.py 8KB
__init__.py 0B
data_utils.py 3KB
models
__init__.py 0B
stateful_lstm_demo.py 3KB
lstm.py 10KB
autoencoder.py 2KB
print_activations.py 2KB
README.md 4KB
stateful_lstm_predictor.py 13KB
notebooks
discords_power_consumption.ipynb 8.57MB
NN_ECG.ipynb 506KB
discords_ECG.ipynb 5.1MB
NAB.ipynb 154KB
NAB_cpu_util_cc053.ipynb 1.62MB
discords_power_consumption_state_experiments.ipynb 16.97MB
NAB_machine_temp.ipynb 2.87MB
discords_shuttle.ipynb 2.41MB
NAB_machine_temp_layers.ipynb 2.93MB
GPyOptDemo.ipynb 1.73MB
NN_machine_temp.ipynb 613KB
data_process.ipynb 83KB
.idea
lstm_anomaly_thesis-master.iml 483B
misc.xml 192B
inspectionProfiles
Project_Default.xml 727B
profiles_settings.xml 174B
modules.xml 304B
共 57 条
- 1
资源评论
小码蚁.
- 粉丝: 2520
- 资源: 4057
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 以简单、易用、高性能为目标、开源的时序数据库,支持Linux及Windows, Time Series Database.zip
- python-leetcode面试题解之第198题打家劫舍-题解.zip
- python-leetcode面试题解之第191题位1的个数-题解.zip
- python-leetcode面试题解之第186题反转字符串中的单词II-题解.zip
- 一个基于python的web后端高性能开发框架,下载可用
- python-leetcode面试题解之第179题最大数-题解.zip
- python-leetcode面试题解之第170题两数之和III数据结构设计-题解.zip
- python-leetcode面试题解之第168题Excel表列名称-题解.zip
- python-leetcode面试题解之第167题两数之和II输入有序数组-题解.zip
- python-leetcode面试题解之第166题分数到小数-题解.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功