Optimizing Search Strategies in k-d Trees
NEAL SAMPLE
†
, MATTHEW HAINES
‡
, MARK ARNOLD
‡
, TIMOTHY PURCELL
†
†
Department of Computer Science
Stanford University
Stanford, CA 94305
UNITED STATES
{nsample, tpurcell}@cs.stanford.edu
‡
Department of Computer Science
University of Wyoming
Laramie, WY 82070
UNITED STATES
{haines, marnold}@cs.uwyo.edu
Abstract: K-d trees have been widely studied, yet their complete advantages are often not realized due to
ineffective search implementations and degrading performance in high dimensional spaces. We outline an
effective search algorithm for k-d trees that combines an optimal depth-first branch and bound (DFBB) strategy
with a unique method for path ordering and pruning. This technique was developed for improving nearest neighbor
(NN) search, but has also proven effective for k-NN and approximate k-NN queries.
Key-Words: k-d trees, search, high dimensionality, DFBB, nearest neighbor, k-NN
1 Introduction
Search is an important facet of many applications, and
various search methods are continually being refined.
Different types of search are of interest to different
disciplines. Nearest neighbor (NN) search is
important to many case-based reasoning (CBR) as
well as various classification and matching problems
[9]. Approximate nearest neighbor search is important
to many AI systems, and in systems where there is an
acceptable trade-off between exact answers and
performance. K-NN and other multivariate range
queries play critical roles in database retrieval,
classification problems, and clustering problems.
Various techniques have been used to solve search
problems, including hashing and indexing, various
types of trees, and many hybrid and novel approaches.
Proposed tree solutions alone include k-d, B+, R+,
BBD, VAMSplit k-d, red-black, Patricia and other
variants [2, 3, 5, 6, 7, 8, 9, 11]. Tree-based search
strategies are popular for many reasons, including, for
n cases, O(log n) search and insertion time, O(n log n)
construction time, and reasonable space requirements.
Tree structures also allow for dynamic insertion of
additional elements and support for range queries.
In this paper, we outline essential optimizations
for performing various types of searches on a k-d tree
structure. The first optimization, tracking nodes,
provides an efficient pruning technique for searching
high-order trees. The second optimization, depth first
branch and bound (DFBB) search, provides a bound
on the search depth. The third optimization, path
ordering, is used in conjunction with DFBB to select
an efficient search order. The fourth optimization, the
addition of an information structure at tree build time,
allows for further pruning with a Bounds-Overlap-Ball
(BOB) test.
Algorithms have been proposed that have
reasonable performance on a subset of the searches
outlined above, but have significant limitations that
prevent their extension to all domains. Recently,
Nene and Nayar proposed a method for effective NN
search in high dimensions [1]. Their method, while
providing a good search time, generates a static
structure that prevents the insertion of additional
elements without an expensive tree rebuilding. Such a
solution would not be appropriate for many domains,
including growing case bases and expanding
databases. K-d trees, while limited by well
understood problems, are nonetheless an important
search structure, and should be used as effectively as
possible.
The k-d tree has been analyzed extensively, but
not exhaustively, since its introduction in 1975 [6].
Friedman, Bentley, and Finkel presented good search
and construction algorithms for NN searching with k-
d trees as early as 1977 [2]. More recently, Arya and
Mount have presented refined search tactics that have
been especially effective for approximate searches [3].
The refinements present here are effective for all the