sensors, like multilateration from cell towers and WiFi access
points. Varying the sampling rate shows the minimum amount of
data required for good map matching. This is important for
practitioners who must decide on how often their GPS receivers
should sample location data, which affects requirements for
memory and bandwidth. Our third contribution is that we make
our GPS data, ground truth, and road network representation
publicly available for other researchers to use in their map
matching work. We believe this is the first time that such a data
set has been made publicly available. Until now, all the work on
map matching used private data sets for testing, making it
impossible to objectively compare results from different
algorithms.
2. THE MAP MATCHING PROBLEM
The map matching problem is illustrated in Figure 1. There are
three measured locations in sequence shown as black dots. The
problem is to find which roads the vehicle was on. The most
obvious algorithm is to simply match each point with the nearest
road. Due to measurement noise, however, this algorithm is prone
to error. In the illustration, the actual path is obvious, but the 2
nd
and 3
rd
point would be mismatched if they were associated with
the nearest road. Even using modern GPS receivers, we have
observed gross outliers and extended sequences of of erroneous
points, likely due to urban canyons and other terrestrial features
that affect GPS signals. Because of problems like this, modern
map matching takes into account sequences of points before
deciding on a match. In the example, there is really only one
reasonable path on the road network that could have produced the
observed measurements.
In our work, like most other map matching work, the raw input
data consists of vehicle locations measured by GPS, as shown in
Figure 2. Each measured point consists of a time-stamped
latitude/longitude pair. The roads are also represented in the
conventional way, as a graph of nodes and edges. The nodes are at
intersections, dead ends, and road name changes, and the edges
represent road segments between the nodes. Some edges are
directional to indicate one-way roads. Each node has an associated
latitude/longitude to indicate its location, and each edge has a
polyline of latitude/longitude pairs to represent its geometry.
Since point-by-point, nearest road matching often fails,
researchers have developed methods that match several points at
once. One way to do this is to create a (possibly smoothed) curve
from the location measurements and attempt to find matching
roads with similar geometry. As an example, White et al. [16]
present four algorithms, starting with the simple, nearest match
scheme. Their second algorithm adds orientation information to
the nearest match approach, comparing the measured heading to
the angle of the road. Their third algorithm evolves the second
algorithm to include connectivity constraints, and their fourth
algorithm does curve matching. They were surprised to discover
that their most sophisticated algorithm, the fourth one, was
outperformed by the simpler second algorithm when tested on a
total of about 17 km of driving data. Another purely geometric
approach comes from Greenfeld [5], whose algorithm builds up a
topologically feasible path through the road network. Matches are
determined by a similarity measure that weights errors based on
distance and orientation. The algorithm was found to perform
Figure 2: This is the GPS data we used for testing in the Seattle, Washington, USA area. The trip starts in the upper right near
Marymoor Park. It consists of 7531 GPS points sampled at 1 Hz, and it covers about 80 kilometers (50 miles) over about 2
hours.