# TODS: Automated Time-series Outlier Detection System
<img width="500" src="./docs/source/img/tods_logo.png" alt="Logo" />
[![Actions Status](https://github.com/datamllab/tods/workflows/Build/badge.svg)](https://github.com/datamllab/tods/actions)
[![codecov](https://codecov.io/gh/datamllab/tods/branch/master/graph/badge.svg?token=M90ZCVTRBF)](https://codecov.io/gh/datamllab/tods)
[中文文档](README.zh-CN.md)
TODS is a full-stack automated machine learning system for outlier detection on multivariate time-series data. TODS provides exhaustive modules for building machine learning-based outlier detection systems, including: data processing, time series processing, feature analysis (extraction), detection algorithms, and reinforcement module. The functionalities provided via these modules include data preprocessing for general purposes, time series data smoothing/transformation, extracting features from time/frequency domains, various detection algorithms, and involving human expertise to calibrate the system. Three common outlier detection scenarios on time-series data can be performed: point-wise detection (time points as outliers), pattern-wise detection (subsequences as outliers), and system-wise detection (sets of time series as outliers), and a wide-range of corresponding algorithms are provided in TODS. This package is developed by [DATA Lab @ Rice University](https://cs.rice.edu/~xh37/index.html).
TODS is featured for:
* **Full Stack Machine Learning System** which supports exhaustive components from preprocessings, feature extraction, detection algorithms and also human-in-the loop interface.
* **Wide-range of Algorithms**, including all of the point-wise detection algorithms supported by [PyOD](https://github.com/yzhao062/pyod), state-of-the-art pattern-wise (collective) detection algorithms such as [DeepLog](https://www.cs.utah.edu/~lifeifei/papers/deeplog.pdf), [Telemanon](https://arxiv.org/pdf/1802.04431.pdf), and also various ensemble algorithms for performing system-wise detection.
* **Automated Machine Learning** aims to provide knowledge-free process that construct optimal pipeline based on the given data by automatically searching the best combination from all of the existing modules.
## Examples and Tutorials
* General Usage: [View in Colab](https://colab.research.google.com/drive/1oKKRqAQnkATsALffaf54zkDGpRseNVGZ?usp=sharing)
* Fraud Detection: [View in Colab](https://colab.research.google.com/drive/15c1Rj60XESwkC2P-BVXUocsXaBJ3M1sr?usp=sharing)
* BlockChain: [View in Colab](https://colab.research.google.com/drive/1fm6yTayjTssSMb6t0VcplBBHl5MrgLFR?usp=sharing)
## Resources
* API Documentations: [http://tods-doc.github.io](http://tods-doc.github.io)
* Paper: [https://arxiv.org/abs/2009.09822](https://arxiv.org/abs/2009.09822)
* Related Project: [AutoVideo: An Automated Video Action Recognition System](https://github.com/datamllab/autovideo)
* :loudspeaker: Do you want to learn more about data pipeline search? Please check out our [data-centric AI survey](https://arxiv.org/abs/2303.10158) and [data-centric AI resources](https://github.com/daochenzha/data-centric-AI)!
## Cite this Work:
If you find this work useful, you may cite this work:
```
@article{Lai_Zha_Wang_Xu_Zhao_Kumar_Chen_Zumkhawaka_Wan_Martinez_Hu_2021,
title={TODS: An Automated Time Series Outlier Detection System},
volume={35},
number={18},
journal={Proceedings of the AAAI Conference on Artificial Intelligence},
author={Lai, Kwei-Herng and Zha, Daochen and Wang, Guanchu and Xu, Junjie and Zhao, Yue and Kumar, Devesh and Chen, Yile and Zumkhawaka, Purav and Wan, Minyang and Martinez, Diego and Hu, Xia},
year={2021}, month={May},
pages={16060-16062}
}
```
## Installation
This package works with **Python 3.7+** and pip 19+. You need to have the following packages installed on the system (for Debian/Ubuntu):
```
sudo apt-get install libssl-dev libcurl4-openssl-dev libyaml-dev build-essential libopenblas-dev libcap-dev ffmpeg
```
Clone the repository (if you are in China and Github is slow, you can use the mirror in [Gitee](https://gitee.com/daochenzha/tods)):
```
git clone https://github.com/datamllab/tods.git
```
Install locally with `pip`:
```
cd tods
pip install -e .
```
# Examples
Examples are available in [/examples](examples/). For basic usage, you can evaluate a pipeline on a given datasets. Here, we provide example to load our default pipeline and evaluate it on a subset of yahoo dataset.
```python
import pandas as pd
from tods import schemas as schemas_utils
from tods import generate_dataset, evaluate_pipeline
table_path = 'datasets/anomaly/raw_data/yahoo_sub_5.csv'
target_index = 6 # what column is the target
metric = 'F1_MACRO' # F1 on both label 0 and 1
# Read data and generate dataset
df = pd.read_csv(table_path)
dataset = generate_dataset(df, target_index)
# Load the default pipeline
pipeline = schemas_utils.load_default_pipeline()
# Run the pipeline
pipeline_result = evaluate_pipeline(dataset, pipeline, metric)
print(pipeline_result)
```
We also provide AutoML support to help you automatically find a good pipeline for your data.
```python
import pandas as pd
from axolotl.backend.simple import SimpleRunner
from tods import generate_dataset, generate_problem
from tods.searcher import BruteForceSearch
# Some information
table_path = 'datasets/yahoo_sub_5.csv'
target_index = 6 # what column is the target
time_limit = 30 # How many seconds you wanna search
metric = 'F1_MACRO' # F1 on both label 0 and 1
# Read data and generate dataset and problem
df = pd.read_csv(table_path)
dataset = generate_dataset(df, target_index=target_index)
problem_description = generate_problem(dataset, metric)
# Start backend
backend = SimpleRunner(random_seed=0)
# Start search algorithm
search = BruteForceSearch(problem_description=problem_description,
backend=backend)
# Find the best pipeline
best_runtime, best_pipeline_result = search.search_fit(input_data=[dataset], time_limit=time_limit)
best_pipeline = best_runtime.pipeline
best_output = best_pipeline_result.output
# Evaluate the best pipeline
best_scores = search.evaluate(best_pipeline).scores
```
# Acknowledgement
We gratefully acknowledge the Data Driven Discovery of Models (D3M) program of the Defense Advanced Research Projects Agency (DARPA)
没有合适的资源?快使用搜索试试~ 我知道了~
自动化时间序列异常检测系统 ;自动机器学习 ; AutoML;异常检测.zip
共874个文件
py:542个
csv:221个
pyc:38个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 151 浏览量
2024-05-02
08:11:38
上传
评论
收藏 3.5MB ZIP 举报
温馨提示
时间序列分析 一个时间序列通常由4种要素组成:趋势、季节变动、循环波动和不规则波动。 趋势:是时间序列在长时期内呈现出来的持续向上或持续向下的变动。 季节变动:是时间序列在一年内重复出现的周期性波动。它是诸如气候条件、生产条件、节假日或人们的风俗习惯等各种因素影响的结果。 循环波动:是时间序列呈现出得非固定长度的周期性变动。循环波动的周期可能会持续一段时间,但与趋势不同,它不是朝着单一方向的持续变动,而是涨落相同的交替波动。 不规则波动:是时间序列中除去趋势、季节变动和周期波动之后的随机波动。不规则波动通常总是夹杂在时间序列中,致使时间序列产生一种波浪形或震荡式的变动。只含有随机波动的序列也称为平稳序列。 时间序列建模基本步骤是:①用观测、调查、统计、抽样等方法取得被观测系统时间序列动态数据。②根据动态数据作相关图,进行相关分析,求自相关函数。相关图能显示出变化的趋势和周期,并能发现跳点和拐点。跳点是指与其他数据不一致的观测值。如果跳点是正确的观测值,在建模时应考虑进去,如果是反常现象,则应把跳点调整到期望值。拐点则是指时间序列从上升趋势突然变为下降趋势的点。
资源推荐
资源详情
资源评论
收起资源包目录
自动化时间序列异常检测系统 ;自动机器学习 ; AutoML;异常检测.zip (874个子文件)
\ 2KB
make.bat 799B
refs.bib 14KB
kpi.csv 597KB
learningData.csv 321KB
learningData.csv 257KB
dataSplits.csv 257KB
dataSplits.csv 102KB
dataSplits.csv 100KB
dataSplits.csv 100KB
learningData.csv 78KB
yahoo_sub_5.csv 71KB
dataSplits.csv 70KB
learningData.csv 70KB
learningData.csv 65KB
learningData.csv 65KB
dataSplits.csv 17KB
dataSplits.csv 17KB
dataSplits.csv 17KB
learningData.csv 8KB
learningData.csv 8KB
131.csv 8KB
125.csv 8KB
124.csv 7KB
26.csv 7KB
123.csv 7KB
128.csv 6KB
31.csv 6KB
32.csv 6KB
46.csv 6KB
129.csv 6KB
33.csv 6KB
44.csv 6KB
132.csv 6KB
148.csv 5KB
159.csv 5KB
53.csv 5KB
45.csv 5KB
25.csv 5KB
156.csv 5KB
49.csv 5KB
155.csv 5KB
158.csv 5KB
51.csv 5KB
55.csv 5KB
165.csv 5KB
54.csv 5KB
50.csv 5KB
52.csv 5KB
63.csv 5KB
47.csv 5KB
121.csv 5KB
150.csv 5KB
56.csv 5KB
151.csv 5KB
61.csv 5KB
157.csv 5KB
48.csv 5KB
154.csv 5KB
57.csv 5KB
152.csv 5KB
147.csv 5KB
66.csv 5KB
162.csv 5KB
59.csv 5KB
161.csv 5KB
163.csv 5KB
65.csv 5KB
153.csv 5KB
68.csv 5KB
168.csv 5KB
69.csv 5KB
172.csv 5KB
62.csv 5KB
164.csv 5KB
64.csv 5KB
167.csv 5KB
160.csv 5KB
80.csv 5KB
70.csv 5KB
60.csv 5KB
169.csv 5KB
58.csv 5KB
67.csv 5KB
174.csv 5KB
182.csv 5KB
181.csv 5KB
171.csv 5KB
166.csv 5KB
87.csv 5KB
77.csv 5KB
19.csv 5KB
79.csv 5KB
78.csv 5KB
183.csv 5KB
176.csv 5KB
89.csv 5KB
173.csv 5KB
90.csv 5KB
76.csv 5KB
共 874 条
- 1
- 2
- 3
- 4
- 5
- 6
- 9
资源评论
野生的狒狒
- 粉丝: 2544
- 资源: 2149
下载权益
C知道特权
VIP文章
课程特权
开通VIP
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功