are the desired results (correspond to lying activity), they are
ranked out of top-20. We show the normalized Q, S
1
and S
6
in
Fig. 1(b). It is difficult to distinguish them after normalization.
By observing Fig. 1(a), one can filter the undesired results
easily by adding an additional constraint: the output subse-
quences should have similar mean value as Q. In fact, this
new type of NSM query, NSM plus some constraints, is useful
in many applications. We list two of them as follows,
• Industry application. In the wind power generation field,
LIDAR system can provide preview information of wind
disturbances [9]. Extreme Operating Gust (EOG) is a
typical gust pattern which is a phenomenon of dramatic
changes of wind speed in a short period. Fig. 2 shows
a typical EOG pattern. This pattern is important because
it may generate damage on the turbine. All EOG pattern
occurrences have the similar shape, and their fluctuation
degree falls within certain range, because the wind speed
cannot be arbitrarily high. If we hope to find all EOG
pattern occurrences in the historical data, we can use a
typical EOG pattern as the query, plus the constraint on
the range of the values.
• IoT application. When a container truck goes through
a bridge, the strain meter planted in the bridge will
demonstrate a specific fluctuation pattern. The value
range in the pattern depends on the weight of the truck.
If we have one occurrence of the pattern as a query, we
can additionally set a mean value range as the constraint
to search container trucks whose weight falls within a
certain range.
Note that the above applications cannot be handled by
RSM query, because the existing offset shifting and amplitude
scaling forces us to set a very large distance threshold, which
will cause many false positive results.
Furthermore, to verify the universality of this new query
type, we investigate the motif pairs in some popular real-world
time series benchmarks. Motif mining [2] is an important time
series mining task, which finds a pair (or set) of subsequences
with minimal normalized distance. For a motif subsequence
pair, say X and Y , we show the relative mean value difference
(ΔMean=
|μ
X
−μ
Y
|
max −min
) and the ratio of standard deviation
(ΔStd= |
σ
X
σ
Y
|) in Fig. 3 We can see that although these pairs
are found without any constraint (like NSM query), both mean
value and standard deviation of motif subsequences are very
similar. So we can find these pairs by the cNSM query, a NSM
query plus a small constraint.
In this paper, we formally define a new subsequence
matching problem, called constrained normalized subsequence
matching problem (cNSM for short). Two constraints, one for
mean value and the other for standard deviation, are added to
the traditional NSM problem. One exemplar of cNSM query
looks like “given a query Q with mean value μ
Q
and stan-
dard deviation σ
Q
, return subsequences S which satisfy: (1)
Dist(
ˆ
S,
ˆ
Q) ≤ 1.5; (2) |μ
Q
−μ
S
|≤5; (3) 0.5 ≤ σ
Q
/σ
S
≤ 2”.
With the constraint, the cNSM problem provides a knob to
flexibly control the degree of offset shifting (represented by
î
Fig. 2. EOG pattern
7D[L
'DWDVHW
Ƹ
0HDQ
3RZHU
7HPSHUDWXUH
Ƹ
6WG
3HQJXLQ
&RPPXWH
(&*
(&*
1356
9LGHR
7(.
Fig. 3. Motif example
mean value) and amplitude scaling (represented by standard
deviation). Moreover, the cNSM problem offers us the oppor-
tunity to build index for the normalized subsequence matching.
Challenges. Solving the cNSM problem faces the following
challenges. First, how can we process the cNSM query effi-
ciently? A straightforward approach is to first apply UCR Suite
to find unconstrained results, and then use mean value and
standard deviation constraints to prune the unqualified ones.
However, it still needs to scan the full series. Can we build an
index and process the query more efficiently?
Second, users often conduct the similar subsequence search
in an exploratory and interactive fashion. Users may try dif-
ferent distance functions, like Euclidean distance or Dynamic
Time Warping. Meanwhile, users may try RSM and cNSM
query simultaneously. Can we build a single index to support
all these query types?
Contributions. Besides proposing the cNSM problem, we
also have the following contributions.
• We present the filtering conditions for four query types,
RSM-ED, RSM-DTW, cNSM-ED and cNSM-DTW, and
prove the correctness. The conditions enable us to build
index and meanwhile guarantee no false dismissals.
• We propose a new index structure, KV-index, and the
query processing approach, KV-match, to support all
these query types. The biggest advantage is that we can
process various types of queries efficiently with a single
index. Moreover, KV-match only needs a few numbers of
sequential scans of the index, instead of many random
accesses of tree nodes in the traditional R-tree index,
which makes it much more efficient.
• Third, to support the query of arbitrary lengths efficiently,
we extend KV-match to KV-match
DP
, which utilizes mul-
tiple indexes with different window lengths. We conduct
extensive experiments. The results verify the efficiency
and effectiveness of our approach.
The rest of the paper is organized as follows. We present
the preliminary knowledge and problem statements in Sec-
tion II. In Section III we introduce the theoretical foundation
and motivate the approach. Section IV and V describe our
index structure, index building algorithm and query processing
algorithm. Section VI extends our method to use multi-
level indexes with different window lengths. The experimental
results are presented in Section VII and we discuss related
867