FLANN - Fast Library for Approximate Nearest
Neighbors
User Manual
Marius Muja, mariusm@cs.ubc.ca
David Lowe, lowe@cs.ubc.ca
February 27, 2009
1 Introduction
We can define the nearest neighbor search (NSS) problem in the following way:
given a set of points P = p
1
, p
2
, . . . , p
n
in a metric space X, these points must
be preprocessed in such a way that given a new query point q ∈ X, finding the
point in P that is nearest to q can be done quickly.
The problem of nearest neighbor search is one of major importance in a
variety of applications such as image recognition, data compression, pattern
recognition and classification, machine learning, document retrieval systems,
statistics and data analysis. However, solving this problem in high dimensional
spaces seems to be a very difficult task and there is no algorithm that performs
significantly better than the standard brute-force search. This has lead to an
increasing interest in a class of algorithms that perform approximate nearest
neighbor searches, which have proven to be a good-enough approximation in
most practical applications and in most cases, orders of magnitude faster that
the algorithms performing the exact searches.
FLANN (Fast Library for Approximate Nearest Neighbors) is a library for
performing fast approximate nearest neighbor searches. FLANN is written in
the C++ programming language. FLANN can be easily used in many contexts
through the C, MATLAB and Python bindings provided with the library.
1.1 Quick Start
This section contains small examples of how to use the FLANN library from
different programming languages (C/C++, MATLAB and Python) and from
the command line.
• C/C++
// file flann_example.cc
#include "flann.h"
#include <stdio.h>
#include <assert.h>
// Function that reads a dataset
float* read_points(char* filename, int *rows, int *cols);
int main(int argc, char** argv)
{
int rows,cols;
int t_rows, t_cols;
float speedup;
// read dataset points from file dataset.dat
float* dataset = read_points("dataset.dat", &rows, &cols);
float* testset = read_points("testset.dat", &t_rows, &t_cols);
// points in dataset and testset should have the same dimensionality
assert(cols==t_cols);
// number of nearest neighbors to search
int nn = 3;
// allocate memory for the nearest-neighbors indices
int* result = new int[t_rows*nn];
1
// allocate memory for the distances
float* dists = new int[t_rows*nn];
// index parameters are stored here
FLANNParameters p;
// want 90% target precision
// the rest of the parameters are automatically computed
p.target_precision = 0.9;
// compute the 3 nearest-neighbors of each point in the testset
flann_find_nearest_neighbors(dataset, rows, cols, testset, t_rows,
result, dists, nn, &p);
// ...
delete[] dataset;
delete[] testset;
delete[] result;
delete[] dists;
return 0;
}
• MATLAB
% create random dataset and test set
dataset = single(rand(128,10000));
testset = single(rand(128,1000));
% define index and search parameters
params.algorithm = ’kdtree’;
params.trees = 8;
params.checks = 64;
% perform the nearest-neighbor search
[result, dists] = flann_search(dataset,testset,5,params);
• Python
from pyflann import *
from numpy import *
from numpy.random import *
dataset = rand(10000, 128)
testset = rand(1000, 128)
flann = FLANN()
result,dists = flann.nn(dataset,testset,5,algorithm="kmeans",
branching=32, iterations=7, checks=16);
• Command line application
$ flann compute_nn --input-file=dataset.dat --test-file=testset.dat
--algorithm=kdtree --trees=8 --checks=64 --nn=5 --output-file=nn.dat
Reading input dataset from dataset.dat
Building index
Building index took: 0.76
Reading test dataset from testset.dat
Searching for nearest neighbors
Searching took 0.06 seconds
Writing matches to nn.dat
2
2 Downloading and compiling FLANN
FLANN can be downloaded from the following address:
http://www.cs.ubc.ca/∼mariusm/flann
After downloading and unpacking, the following files and directories should
be present:
• src: directory containg the source files
• doc: directory containg this documentation
• bin: directory various for scripts and binary files
In addition, the FLANN libary will install itself in the directory build (un-
less overridden by the CMAKE INSTALL PREFIX variable). The build directory
will have the following structure:
• bin: contains the command line application (flann) and other binary files
• lib: contains the compiled libraries (libflann.so and libflann s.a for
linux)
• include: contains the C header files
• matlab: contains the MATLAB wrapper functions and a MEX (Matlab
EXecutable) file
• python: contains the python bindings
To compile the flann library the CMake
1
build system is required. Below is
an example of how the FLANN library can be compiled on Linux (replace x.y
with the corresponding version number).
$ cd flann-x.y-src
$ mkdir tmp
$ cd tmp
$ cmake ../src -DCMAKE_BUILD_TYPE=release
$ make install
On windows the steps are similar:
> cd flann-x.y-src
> md tmp
> cd tmp
> cmake ..\src -G ‘‘NMake Makefiles’’ -DCMAKE_BUILD_TYPE=release
> nmake install
1
http://www.cmake.org/
3
3 Using FLANN
3.1 Using FLANN from MATLAB
The FLANN library can be used from MATLAB through the following wrap-
per functions: flann build index, flann search and flann free index. The
flann build index function creates a search index from the dataset points,
flann search uses this index to perform nearest-neighbor searches and flann free index
deletes the index and releases the memory it uses.
Note that in the binary distribution of FLANN the MEX file is linked against
the shared version of FLANN (flann.so or flann.dll), so on Linux you must
set the LD LIBRARY PATH environment variable accordingly prior to starting
MATLAB. On Windows is enough to have flann.dll in the same directory
with the MEX file.
The following sections describe in more detail the FLANN matlab wrapper
functions and show examples of how they may be used.
3.1.1 flann build index
This function creates a search index from the initial dataset of points, index
used later for fast nearest-neighbor searches in the dataset.
[index, parameters, speedup] = flann_build_index(dataset,build_params);
The arguments passed to the flann build index function have the following
meaning:
dataset is a d × n matrix containing n d-dimensional points
build params - is a MATLAB structure containing the parameters passed to
the function.
Depending on the contents of the build params structure, the function has
two different behaviors. If the structure contains a field that specifies the index
type to create the function will create an index of that type (using index pa-
rameters which also have to be included in the build params structure). If the
index and index parameters are not specified directly the function will first try
to automatically detect the best index and index parameters to use for nearest
neighbor search in the provided dataset.
Using automatic index and parameter configuration When using au-
tomatic configuration the build params structure must contain the following
fields:
target precision - is a number between 0 and 1 specifying the percentage of
the approximate nearest-neighbor searches that return the exact nearest-
neighbor. Using a higher value for this parameter gives more accurate
results, but the searching takes longer. The optimum value usually de-
pends on the application.
4