174 HAN ET AL.
Stream data cube architecture facilitates online analytical processing of stream data. It also forms a preliminary
data structure for online stream data mining. The impact of the design and implementation of stream data cube in
the context of stream data mining is also discussed in the paper.
1. Introduction
With years of research and development of data warehousing and OLAP technology [9,
15], a large number of data warehouses and data cubes have been successfully constructed
and deployed in applications, and data cube has become an essential component in most
data warehouse systems and in some extended relational database systems and has been
playing an increasingly important role in data analysis and intelligent decision support.
The data warehouse and OLAP technology is based on the integration and consolidation
of data in multi-dimensional space to facilitate powerful and fast on-line data analysis.
Data are aggregated either completely or partially in multiple dimensions and multiple
levels, and are stored in the form of either relations or multi-dimensional arrays [1, 29]. The
dimensions in a data cube are of categorical data, such as products, region, time, etc., and
the measures are numerical data, representing various kinds of aggregates, such as sum,
average, variance of sales or profits, etc.
The success of OLAP technology naturally leads to its possible extension from the
analysis of static, pre-integrated, historical data to that of current, dynamically changing
data, including time-series data, scientific and engineering data, and data produced in other
dynamic environments, such as power supply, network traffic, stock exchange, telecommu-
nication data flow, Web click streams, weather or environment monitoring, etc.
A fundamental difference in the analysis of stream data from that of relational and
warehouse data is that the stream data is generated in huge volume, flowing in-and-out
dynamically, and changing rapidly. Due to limited memory or disk space and processing
power available in today’s computers, most data streams may only be examined in a
single pass. These characteristics of stream data have been emphasized and investigated by
many researchers, such as [6, 7, 12, 14, 16], and efficient stream data querying, clustering
and classification algorithms have been proposed recently (such as [12, 14, 16, 17, 20]).
However, there is another important characteristic of stream data that has not drawn enough
attention: Most of stream data resides at rather low level of abstraction, whereas an analyst
is often more interested in higher and multiple levels of abstraction. Similar to OLAP
analysis of static data, multi-level, multi-dimensional on-line analysis should be performed
on stream data as well.
The requirement for multi-level, multi-dimensional on-line analysis of stream data,
though desirable, raises a challenging research issue: “Is it feasible to perform OLAP
analysis on huge volumes of stream data since a data cube is usually much bigger than the
original data set, and its construction may take multiple database scans?”
In this paper, we examine this issue and present an interesting architecture for on-
line analytical analysis of stream data. Stream data is generated continuously in a dynamic
environment, with huge volume, infinite flow, and fast changing behavior. As collected, such
data is almost always at rather low level, consisting of various kinds of detailed temporal
and other features. To find interesting or unusual patterns, it is essential to perform analysis
on some useful measures, such as sum, average, or even more sophisticated measures, such