OnlineLearningandSequentialAnomalyDetectioninTrajectories资源-CSDN文库

机器学习

人工智能

需积分: 13 118 浏览量 2019-03-06 13:49:09 上传评论收藏 2.95MB PDF 举报

资源推荐

资源详情

资源评论

1222 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 26, NO. 3, JUNE 2018

On-Line Anomaly Detection With High Accuracy

Kun Xie , Member, IEEE, Xiaocan Li, Xin Wang, Member, IEEE, Jiannong Cao, Fellow, IEEE,

Gaogang Xie, Member, IEEE, Jigang Wen, Dafang Zhang, Member, IEEE,andZhengQin

Abstract—Trafﬁc anomaly detection is critical for advanced

Internet management. Existing detection algorithms generally

convert the high-dimensional data to a long vector, which

compromises the detection accuracy due to the loss of spatial

information of data. Moreover, they are generally designed

based on the separation of normal and anomalous data in

a time period, which not only introduces high storage and

computation cost but also prevents timely detection of anom-

alies. Online and accurate trafﬁc anomaly detection is critical

but difﬁcult to support. To address the challenge, this paper

directly models the monitoring data in each time slot as a

2-D matrix, and detects anomalies in the new time slot based

on bilateral principal component analysis (B-PCA). We propose

several novel techniques in OnlineBPCA to support quick and

accurate anomaly detection in real time, including a novel B-

PCA-based anomaly detection principle that jointly considers

the variation of both row and column principal directions for

more accurate anomaly detection, an approximate algorithm

to avoid using iteration procedure to calculate the principal

directions in a close-form, and a sequential anomaly algorithm

to quickly update principal directions with low computation and

storage cost when receiving a new data matrix at a time slot.

To the best of our knowledge, this is the ﬁrst work that exploits

2-D PCA for anomaly detection. We have conducted extensive

simulations to compare our OnlineBPCA with the state-of-art

Manuscript received April 26, 2017; revised November 24, 2017 and

March 5, 2018; accepted March 14, 2018; approved by IEEE/ACM

RANSACTIONS ON NETWORKING Editor C. W. Tan. Date of publication

April 26, 2018; date of current version June 14, 2018. This work was

supported in part by the National Natural Science Foundation of China under

Grant 61572184, Grant 61725206, Grant 61472130, Grant 61472131, and

Grant 61772191, in part by the Hunan Provincial Natural Science Foundation

of China under Grant 2017JJ1010, in part by the Science and Technology Key

Projects of Hunan Province under Grant 2015TP1004 and Grant 2016JC2012,

in part by the U.S. ONR under Grant N00014-17-1-2730, in part by the

NSF under Grant ECCS 1408247, Grant CNS 1526843, and Grant ECCS

1731238, and in part by the Open Project Funding of the CAS Key Lab-

oratory of Network Data Science and Technology, Institute of Computing

Technology, Chinese Academy of Sciences, under Grant CASNDST201704.

(Corresponding author: Kun Xie.)

K. Xie is with the College of Computer Science and Electronics Engineer-

ing, Hunan University, Changsha 410006, China, and also with the CAS Key

Laboratory of Network Data Science and Technology, Institute of Computing

Technology, Chinese Academy of Sciences, Beijing, China, and with the

Department of Electrical and Computer Engineering, The State University

of New York at Stony Brook, Stony Brook, NY 11794 USA (e-mail:

xiekun@hnu.edu.cn).

X. Li, D. Zhang, and Z. Qin are with the College of Computer Science and

Electronics Engineering, Hunan University, Changsha 410006, China (e-mail:

hnulxc@hnu.edu.cn; dfzhang@hnu.edu.cn; zqin@hnu.edu.cn).

X. Wang is with the Department of Electrical and Computer Engineering,

The State University of New York at Stony Brook, Stony Brook, NY

11794 USA (e-mail: x.wang@stonybrook.edu).

J. Cao is with the Department of Computing, The Hong Kong Polytechnic

University, Hong Kong (e-mail: csjcao@comp.polyu.edu.hk).

G. Xie and J. Wen are with the Network Research Center, Institute of

Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

(e-mail: xie@ict.ac.cn; wenjigang@ict.ac.cn).

Digital Object Identiﬁer 10.1109/TNET.2018.2819507

anomaly detection algorithms using real trafﬁc traces Abilene

and GÈANT. Our simulation results demonstrate that, compared

with other algorithms, our OnlineBPCA can achieve signiﬁcantly

better detection performance with low false positive rate, high

true positive rate, and low computation cost.

Index Terms— Anomaly detection, on-line algorithm, bilateral

PCA.

I. INTRODUCTION

RAFFIC anomalies, caused by sources such as ﬂash

crowds, denial-of-service attacks, port scans, and the

spreading of worms, can have detrimental effects on network

services. Detecting and diagnosing these anomalies are critical

to both network operators and end users.

Existing efforts [1]–[14] on anomaly detection usually

model the trafﬁc monitoring data of a time slot as a vector

and use a trafﬁc matrix to record the trafﬁc monitoring data

of a period. In the example trafﬁc matrix of Fig.1, each

row denotes an OD (origin-destination) pair and each column

denotes a time slot. As normal trafﬁc data generally exhibit

strong spatio-temporal correlations [2], [8], [9], the normal

trafﬁc matrix has low-rank. Moreover, as it is very costly for

an attacker to compromise a large number of OD pairs for

a long period of time, the anomalous data over time also

form a sparse matrix. Based on the observations, to detect

anomalies, existing studies usually separate the observed trafﬁc

data into two parts, a low-rank normal data matrix and a sparse

outlier data matrix as shown in Fig.1. After the separation,

the anomalies are detected and located from the outlier part.

The techniques applied for anomaly detection based on

data separation include PCA [3], [5], [6], [8]–[12], Robust

PCA [14], [15], bilinear factor matrix norm minimization [16],

and recent Direct Robust Matrix Factorization (DRMF) [1],

[17]). Detecting anomalies generally based on off-line learn-

ing, these methods require storing all the monitoring data

within a period and operate on these data, which not only

introduces high storage and computation cost but also prevents

timely detection of anomalies.

It is essential to detect a sudden or unexpected change

of the trafﬁc behavior as soon as possible. Although very

important, real-time anomaly detection is extremely difﬁcult

to achieve. It requires a light-weight algorithm to accurately

and quickly identify whether the newly arriving data contain

anomalies or not. Different from data separation, there are very

limited studies on online anomaly detection. The work in [18]

attempts to check the variation of PCA transformation between

time slots to detect the anomaly. Although it is effective,

designed based on conventional PCA that only operates over a

vector of data, it still models the trafﬁc data in each time slot

See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

XIE et al.: ON-LINE ANOMALY DETECTION WITH HIGH ACCURACY 1223

Fig. 1. Trafﬁc anomaly detection methods based data separation.

as a vector. For a network consisting of N nodes, there are

N ×N OD pairs, which will form a very long vector of the size

N ×N. This will in turn result in a large covariance matrix and

large computation cost in the process of PCA transformation.

Actually, trafﬁc monitoring data captured in one time slot

can be naturally represented as an N × N matrix, with each

row denoting the transmission data from the same origin

node and each column denoting the transmission towards the

same destination. In existing anomaly detection approaches,

no matter through off-line data separation or online algorithm

of [18], the data from network measurements in one time slot

are usually modeled as a vector. The vector either directly

contains all OD pairs as in Fig.1, or is formed with the con-

catenation of rows or columns of an N ×N source-destination

matrix through “matrix-to-vector alignment”. Besides the com-

putation complexity resulted from a long vector, the spatial

information hidden in the trafﬁc data will get lost when

representing data in a vector form, which will compromise

the performance of anomaly detection.

Rather than converting data from the matrix form to vec-

tor, directly using the matrix to represent trafﬁc data in a

time slot for anomaly detection can better capture the data

relationship between rows and columns, which may help

increase the detection accuracy. To understand the possibility

and beneﬁt, in this work, we propose to directly apply bilateral

PCA (B-PCA) over the data points corresponding to

two-dimensional data matrices for on-line anomaly detection.

Although there are some recent studies [19]–[22] on the direct

use of two-dimensional PCA over image data for feature

extraction, these methods for feature extraction cannot be

directly used for anomaly detection. Applying B-PCA for on-

line anomaly detection faces three major challenges:

• There lacks a principle to exploit the data features from

both rows and columns for anomaly detection;

• Different from conventional PCA, B-PCA does not have

close-form solutions for ﬁnding the projection matrices,

which makes it even harder to quickly detect the anomaly;

• Performing B-PCA operation over a huge set of historical

data is computationally expensive and time consuming.

Different from conventional algorithms for anomaly detec-

tion, we model the network monitoring data in each time slot

as a 2D matrix, and propose an online anomaly detection

scheme based on B-PCA (termed OnlineBPCA) to detect

whether newly arriving data contain anomalies or not. In

addition, to enable online anomaly detection, we propose a

set of algorithms for quick data processing. Our OnlineBPCA

includes the following set of novel techniques to support quick

and accurate anomaly detection:

• We propose a novel two-directional-change-based anom-

aly detection principle to detect whether newly arriving

data contain anomalies, where we check the changes

of the principal directions from both row side and col-

umn side. This is the ﬁrst anomaly detection principle

proposed that allows B-PCA to be applied for anomaly

detection. As these two principal directions can more

comprehensively and accurately extract the features hid-

den in the monitoring data, our OnlineBPCA can achieve

a higher accuracy in detecting the anomaly than conven-

tional anomaly detection algorithms.

• We propose an approximate algorithm to calculate the

principal directions in a close form without using the

iteration procedure, which in turn provides the possibility

of designing a sequential algorithm for quickly detecting

anomalies online. Our simulation results demonstrate that

such an approximation does not decrease the detection

accuracy while signiﬁcantly reducing the computation

cost.

• To quickly detect anomalous data, principal directions

need to be updated to adapt to the network changes in

real time. Unlike the batch methods which process all the

data together, we propose a sequential anomaly detection

algorithm that does not require the storage of the past

data and can update the principal directions using the

most recent monitoring data. As a result, our method is

fast and preferred for streaming data and on-line anomaly

detection.

• To amplify the impact of newly arriving data on the prin-

cipal directions of the monitoring data set, we propose a

novel strength method to duplicate the newly arriving

data multiple times. This would make it easier to ﬁnd

anomalies even for a large data set.

Using trafﬁc trace data Abilene [23] and GÈANT [24],

we implement ten anomaly detection algorithms to evalu-

ate our OnlineBPCA. Compared with other peer algorithms,

OnlineBPCA can detect whether newly arriving data contain

anomalies with much lower False Positive Rate and higher

True Positive Rate. Speciﬁcally, beneﬁting from our approx-

imate algorithm and sequential algorithm, OnlineBPCA can

accurately detect the anomaly with very fast speed.

The rest of this paper is organized as follows. Section II

presents the related work. We introduce the preliminary work

on B-PCA in Section III. We describe our anomaly detection

principle, approximate algorithm to ﬁnd the projection matri-

ces, and sequential anomaly detection algorithm in Section IV,

Section V, and Section VI, respectively. Finally, we evaluate

the performance using real trafﬁc trace data in Section VIII,

and conclude the work in Section IX.

II. R

ELATED WORK

Principal Component Analysis (PCA) [7] is perhaps the

best-known statistical analysis technique to achieve data sep-

aration for anomaly detection. PCA uses an orthogonal trans-

formation to convert possibly correlated observed variables

into a set of linearly uncorrelated variables (called principal

1224 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 26, NO. 3, JUNE 2018

components or principal directions). Some recent papers that

apply PCA to the trafﬁc anomaly detection have shown some

promising initial results [3], [5], [6], [8]–[12]. The principal

components (PCs) are found and sorted in the order of

contribution to overall variance, with the PCs in the lower

dimensional and higher dimensional subspaces capturing the

dynamic properties of the system and noisy information,

respectively. As a result, to separate data, the PCs can be

divided into two sets. The trafﬁc data mapped to principal

components (PCs) in the lower dimensional subspace are

normal, and the remaining data are the anomalies. For data

separation, all trafﬁc data should be calculated and mapped

to the two subspaces, corresponding respectively to the two

types of PCs.

Although PCA-based data separation is effective when the

corruption is caused by small additive noise, recent study [13]

shows that traditional PCA-based approaches fail under the

large corruption, even if the corruption affects only very

few of the observations. To achieve better data separation,

recently, Candès et. al. [14] propose Robust PCA (RPCA)

which decomposes a given observation (noisy) matrix X into

a low-rank component X



and a sparse outlier component E.

To make the problem solvable, the work in [15] replaces

the matrix rank and the cardinality (

) functions with their

convex surrogates, the nuclear norm 

∗

(i.e., the sum of its

singular values) and the L

norm 

, and solves the following

convex optimization problem

min



{X



∗

+ λE

}

st. X



+ E = X (1)

where λ is a positive weighting parameter. To decompose

the data into low-rank component and sparse component,

these methods resort to some relaxation techniques which may

largely impact the accuracy of anomaly detection.

To conquer the challenge in RPCA, work in [17] proposes

Direct Robust Matrix Factorization (DRMF) which directly

formulates the problem in its original way using the matrix

rank to represent the low rank feature of normal data matrix

and the L

-norm to represent the sparse feature of the anomaly

data. However, the solution involves the iterative execution of

SVD decomposition, which will bring very high computation

cost and is not scalable for large trafﬁc data.

Shang et al. [16] observe two other issues of RPCA, which

leads the solution to be biased. That is, the use of nuclear

norm over penalizes large singular values, and the use of L

norm over penalizes large entries of the matrix. To address

the issues, the authors propose two bilinear factor matrix

norm minimization models for robust principal component

analysis. Speciﬁcally, the paper considers two speciﬁc l

-norm

minimization, with p =1/2 and p =2/3, respectively.

Although data separation is an effective way of detecting

anomalies appearing within a period, it is not suitable for

online detection. Data separation approaches usually work off-

line and require operating on the whole set of trafﬁc data

captured in a period consisting of multiple time slots, which

consume a large amount of storage and long computation

time. Moreover, some data separation techniques such as

Fig. 2. Matrix-to-vector alignment.

RPCA and DRMF need a time-consuming iterative process to

separate the observed trafﬁc data, which further increases the

computation cost. In contrast, our anomaly detection algorithm

aims to quickly and accurately detect whether trafﬁc data

newly arriving in a time slot contain anomalies.

Very limited work [18] studies the on-line anomaly detec-

tion. Different from PCA-based algorithms [5], [6], [8]–[11]

[3], [12] that separate normal data and anomalous data by

mapping the trafﬁc data into two types of PCs, [18] proposes

to detect whether the new data contain anomalies by checking

the impact on PCs from the newly arriving data. However,

as it is designed based on conventional PCA which can only

operate vector-form based data points, the trafﬁc monitoring

data of each time slot in [18] is modeled as a vector like other

trafﬁc anomaly detection algorithms, which suffer from the

loss of spatial data correlation and computation complexity.

As shown in Fig.2, the entries of ﬁrst column denote the

trafﬁc data from different source nodes to the same destination

node a, and are thus correlated. However, they are set far away

from each other in the vectorized representation. A column

anomaly can happen when a network of remotely controlled

and widely scattered Zombies or Botnet computers launch

DDoS attacks by simultaneously sending a large amount of

trafﬁc and/or a large number of service requests to the target

system. Besides losing the spatial information and compro-

mising the anomaly detection, “matrix-to-vector alignment”

also leads the vector to be long, which will result in a

large size covariance matrix in [18]’s approach. Computing

the eigenvectors of a large covariance matrix is very time-

consuming.

Different from conventional PCA which can only deal with

vector data, a new technique called two-dimensional principal

component analysis (2DPCA) [19] was recently proposed in

the image ﬁeld to handle 2D image, which directly computes

eigenvectors of the so-called covariance matrix without matrix-

to-vector conversion. Because the size of the covariance matrix

is equal to the width of images, which is quite small compared

with the size of a covariance matrix in PCA, 2DPCA evaluates

the image covariance matrix more accurately and computes the

corresponding eigenvectors more efﬁciently than PCA. Based

on tests over data from several databases [19], the accuracy

of face recognition using 2DPCA was found to be higher

than that using PCA, and the extraction of image features is

computationally more efﬁcient.

However, studies in [26] show that 2DPCA is essentially

working in the row direction of images. If we apply 2DPCA

to anomaly detection, the information hidden in the data matrix

can not be fully utilized, so the detection accuracy will be still

low. To overcome the restrictions under 2DPCA, two linear

剩余13页未读，继续阅读

评论收藏

内容反馈

fuwell

粉丝: 1
资源: 13

Online Learning and Sequential Anomaly Detection in Trajectories

最新资源

Online Learning and Sequential Anomaly Detection in Trajectories

Anomaly-Detection-for-Sequential-Data

Road Anomaly Detection Through Deep Learning Approaches

Machine-Learning-Algorithms-for-Predators-Detection-in-Online-Chat-Conversations-Paper:一份ML论文旨在分析不同领域的社会问题并找到社会公益，SVM，MLP，CNN，GAN的最佳解决方案

Machine+Learning+Methods+for+Behaviour+Analysis+and+Anomaly+Detection-2018.pdf

Sequential Analysis - Hypothesis Testing and Changepoint Detection

OS-ELM.zip_ELM序列算法_online Learning_os elm_在线ELM

Bishop Pattern Recognition and Machine Learning

Mastering Machine Learning with R - Second Edition

Deep-Learning-for-Online-Defect-Detection-in-a-Ceramic-Tape-Casting-Process

2.Deep Learning for Anomaly Detection A Review 论文分享（中）.pdf

Deep-Learning-Time-Series-Anomaly-Detection-master_.zip

Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow

伯克利大学机器学习-12Time series&sequential hypothesis testing&anomaly de

Algorithm Design Parallel and Sequential 2017.9.pdf

reinforce learning.pdf

ICML 2018强化学习tutorial: Imitation Learning

Deep Learning for Anomaly Detection A Review.pdf

Network Anomaly Detection-Machine Learning perspective

A Discriminative Metric Learning Based Anomaly Detection Method

Network Anomaly Detection: A Machine Learning Perspective

《Deep Learning》Yann LeCun, YoshuaBengio, Geoffery Hinton

Context_aware Sequential Recommender

Object Tracking

最新资源