2
mine whether each bin is ground.
• Our proposed method shows promising performance over
the state-of-the-art, region-wise fitting–based methods at
more than 40 Hz. In particular, Patchwork estimates
the ground points with the least recall variance, which
shows that our proposed method overcomes the under-
segmentation issue in complex urban environments.
II. RELATED WORKS
A. The Difficulties of Ground Segmentation
One may argue that it is a simple task that can be easily esti-
mated by filtering a point cloud based on sensor height or using
RANSAC [17] which is a renowned method for estimating a
plane. Unfortunately, there are three main issues that impede
algorithms from conducting precise ground segmentation: a)
there exists a partially steep slope or bumpy road, b) curbs
or flower beds make some regions uneven, and c) because all
surrounding objects are taken into account as outliers in the
ground segmentation tasks, these objects hinder plane fitting.
For these reasons, sometimes under-segmentation occurs, in
which case points belonging to different objects are merged
into the same segment [6], [12].
B. Ground Plane Estimation Methods
To tackle these issues, numerous researchers have studied
various approaches. For instance, Douillard et al. [4] and Chen
et al. [18] employed Gaussian process–based methods. On the
other hand, Tse et al. [19], Byun et al. [3], and Rummelhard
et al. [20] proposed Markov Random Field–based methods.
These methods can be used to estimate detailed ground points
yet requiring much computational time, so it may not be
appropriate to use them as preprocessing algorithms whose
speed should be guaranteed at more than 20 Hz.
C. Scan Representation
Meanwhile, grid representation–based methods have been
widely utilized to leverage expressibility compared with sin-
gular plane model–based methods [4], [9]. In particular, polar
grid representation, which treats a point cloud in cylindrical
coordinates, is commonly employed these days because it
naturally compensates for the geometric characteristics of
3D LiDAR sensors [11], [12], [14], [16], [21]. In practice,
Thrun et al. [5] presented a grid cell–based binary ground
classification method in a probabilistic way to predict the
movable area for autonomous driving in the DARPA challenge.
These methods are mainly divided into two categories: a)
elevation map–based and b) model fitting–based methods.
Accordingly, the latter category can be further classified into
two main methodologies: a) line fitting–based and b) plane
fitting–based methods.
D. Elevation Map–based 2.5D Grid Representation
First, elevation map–based methods are used to distinguish
between ground and non-ground points by encoding a 3D
point cloud into 2.5D grid representations [5], [9]. Thrun et al.
[5] utilized relative height and Asvadi et al. [9] used average
height and its covariance on each grid. These methods have
strong advantages over other methods in terms of speed and
computational cost. However, there are some potential risks
that sometimes a steep slope region could be considered as a
non-ground region because of large z value difference between
its supremum and infimum points with respect to Z-axis.
E. Multiple Line Fitting–based Ground Segmentation
Next, Himmelsbach et al. [11] and Steinhauser et al. [21]
introduced 2D line fitting on a uniform polar grid repre-
sentation to estimate the straight-line equation on each grid.
Then, in each grid, it was determined whether points were
ground points by comparing between constant thresholds and
the estimated parameters, such as the point-to-line distance,
gradient, or y-intercept.
F. Multiple Plane Fitting–based Ground Segmentation
Sharing their views of region-wise fitting yet improving ro-
bustness, other researchers have conducted region-wise plane
fitting–based approaches [8], [12], [14], [16]. For instance,
Zermas et al. [8] divided a point cloud into three parts
along the x-axis of the body frame, which is the forward
direction of a vehicle. This method is based on the premise
that a slope usually changes along the x-axis; however, this
assumption sometimes fails when it comes to a bumpy road
or a complex intersection. To resolve the problem, Narksri
et al. [12] proposed a slope-robust method using consecutive
ring patterns in the scan data as well as the concept of the
continuity of the region-wise estimated plane along the radial
direction. Furthermore, Narksri et al. [12] and Cheng et al.
[16] proposed an adaptive way of setting a grid size depending
on the density of the cloud points or the incidence angle.
G. Deep Learning-based Methods
Of course, as the deep learning era has come, Milioto et
al. [7] proposed RangeNet++ to estimate point-wise labels
on a 3D point cloud and Paigwar et al. [22] presented
GndNet, which estimates ground plane elevation information
in a grid–based representation to discern ground points in
real time. Unfortunately, these methods usually require high
computational resources. In addition, these methods tend to
be highly fitted to the environments of train dataset; thus, the
performance of those can be potentially degraded when used
in quite different environments from the training dataset or
different sensor configuration [23].
III. METHODOLOGY OF PATCHWORK
The following paragraphs highlight the problem definition
and the reasoning behind each module of Patchwork. Patch-
work mainly consists of three parts: CZM, R-GPF, and GLE.
A. Problem Definition
First, we begin by denoting a point cloud at the moment
as P. Then, let P = {p
1
, p
2
, . . . , p
k
, . . . , p
N
} be a set of
cloud points that contain N points at the moment acquired by
a 3D LiDAR sensor, where each point p
k
consists of p
k
=
{x
k
, y
k
, z
k
} in the Cartesian coordinates. In this paper, P is
definitely classified into two classes: a set of ground points,
G, and its complement, G
c
, which satisfy G ∪ G
c
= P. Note