KITTI数据集评价标准paper资源-CSDN文库

需积分: 18 50 浏览量 2022-02-10 22:13:24 上传评论收藏 855KB PDF 举报

《准备好了吗？自动驾驶的KITTI视觉基准》在当今的科技时代，自动驾驶技术正逐渐成为现实，但视觉识别系统在机器人应用中的广泛使用仍较为罕见。一个主要的原因可能在于缺乏能够模拟实际应用场景的严格基准测试。这篇论文，由Andreas Geiger和Philip Lenz等人发表，基于Karlsruhe Institute of Technology和Toyota Technological Institute at Chicago的研究，提出了名为“KITTI视觉基准套件”的新挑战，以促进自动驾驶技术的发展。论文中，研究人员利用他们自主驾驶平台的优势，设计了针对立体视觉、光流、视觉里程计/SLAM（Simultaneous Localization and Mapping）以及3D物体检测的系列基准测试。测试平台配备了四个高分辨率视频摄像头、一个Velodyne激光扫描仪和最先进的定位系统，确保了数据采集的全面性和精确性。基准测试包含了389对立体图像和光学流图像，39.2公里长度的立体视觉里程计序列，以及超过20万个在复杂场景（每张图像中最多可显示15辆汽车和30个行人）中捕捉到的3D物体注释。这些数据集展示了实际世界中的复杂性和多样性，对现有算法的性能提出了严峻挑战。论文结果显示，一些在如Middlebury等传统数据集上表现出色的算法，在转移到真实世界环境时，其性能却低于平均水平。这表明了实验室环境下的测试与真实世界应用之间存在的显著差距。通过提供具有新颖困难度的基准测试，作者们旨在缩小这一差距，推动计算机视觉社区的研究进步。论文中提到，虽然过去几年已经有许多新型传感器被用于识别、导航和物体操纵任务，如GPS、激光测距仪、雷达以及环境的精确地图，但视觉传感器在机器人应用中的使用仍然不足。而自动驾驶系统大多依赖这些非视觉传感器，这强调了发展高效视觉识别系统的重要性。 KITTI数据集的出现为自动驾驶研究提供了更为实际、更具挑战性的测试基准，有助于评估和改进算法在复杂环境中的性能。这些基准测试数据集可在www.cvlibs.net/datasets/kitti网站上获取，供全球科研人员使用，推动自动驾驶技术的持续进步。通过这样的开放共享，我们有望逐步解决当前自动驾驶技术面临的难题，加速其在现实生活中的应用。

资源推荐

资源详情

资源评论

Are we ready for Autonomous Driving?

The KITTI Vision Benchmark Suite

Andreas Geiger and Philip Lenz

Karlsruhe Institute of Technology

{geiger,lenz}@kit.edu

Raquel Urtasun

Toyota Technological Institute at Chicago

rurtasun@ttic.edu

Abstract

Today, visual recognition systems are still rarely em-

ployed in robotics applications. Perhaps one of the main

reasons for this is the lack of demanding benchmarks that

mimic such scenarios. In this paper, we take advantage

of our autonomous driving platform to develop novel chal-

lenging benchmarks for the tasks of stereo, optical ﬂow, vi-

sual odometry / SLAM and 3D object detection. Our record-

ing platform is equipped with four high resolution video

cameras, a Velodyne laser scanner and a state-of-the-art

localization system. Our benchmarks comprise 389 stereo

and optical ﬂow image pairs, stereo visual odometry se-

quences of 39.2 km length, and more than 200k 3D ob-

ject annotations captured in cluttered scenarios (up to 15

cars and 30 pedestrians are visible per image). Results

from state-of-the-art algorithms reveal that methods rank-

ing high on established datasets such as Middlebury per-

form below average when being moved outside the labora-

tory to the real world. Our goal is to reduce this bias by

providing challenging benchmarks with novel difﬁculties to

the computer vision community. Our benchmarks are avail-

able online at:

www.cvlibs.net/datasets/kitti

1. Introduction

Developing autonomous systems that are able to assist

humans in everyday tasks is one of the grand challenges in

modern computer science. One example are autonomous

driving systems which can help decrease fatalities caused

by trafﬁc accidents. While a variety of novel sensors have

been used in the past few years for tasks such as recognition,

navigation and manipulation of objects, visual sensors are

rarely exploited in robotics applications: Autonomous driv-

ing systems rely mostly on GPS, laser range ﬁnders, radar

as well as very accurate maps of the environment.

In the past few years an increasing number of bench-

marks have been developed to push forward the perfor-

mance of visual recognitions systems, e.g., Caltech-101

Figure 1. Recording platform with sensors (top-left), trajectory

from our visual odometry benchmark (top-center), disparity and

optical ﬂow map (top-right) and 3D object labels (bottom).

[

17], Middlebury for stereo [41] and optical ﬂow [2] evalu-

ation. However, most of these datasets are simplistic, e.g.,

are taken in a controlled environment. A notable exception

is the PASCAL VOC challenge [

16] for detection and seg-

mentation.

In this paper, we take advantage of our autonomous driv-

ing platform to develop novel challenging benchmarks for

stereo, optical ﬂow, visual odometry / SLAM and 3D object

detection. Our benchmarks are captured by driving around a

mid-size city, in rural areas and on highways. Our recording

platform is equipped with two high resolution stereo cam-

era systems (grayscale and color), a Velodyne HDL-64E

laser scanner that produces more than one million 3D points

per second and a state-of-the-art OXTS RT 3003 localiza-

tion system which combines GPS, GLONASS, an IMU and

RTK correction signals. The cameras, laser scanner and lo-

calization system are calibrated and synchronized, provid-

ing us with accurate ground truth. Table

1 summarizes our

benchmarks and provides a comparison to existing datasets.

Our stereo matching and optical ﬂow estimation bench-

mark comprises 194 training and 195 test image pairs at

a resolution of 1240 × 376 pixels after rectiﬁcation with

semi-dense (50%) ground truth. Compared to previous

datasets [

41, 2, 30, 29], this is the ﬁrst one with realis-

tic non-synthetic imagery and accurate ground truth. Dif-

ﬁculties include non-lambertian surfaces (e.g., reﬂectance,

transparency) large displacements (e.g., high speed), a large

variety of materials (e.g., matte vs. shiny), as well as differ-

ent lighting conditions (e.g., sunny vs. cloudy).

Our 3D visual odometry / SLAM dataset consists of

22 stereo sequences, with a total length of 39.2 km. To

date, datasets falling into this category are either monocular

and short [

43] or consist of low quality imagery [42, 4, 35].

They typically do not provide an evaluation metric, and as

a consequence there is no consensus on which benchmark

should be used to evaluate visual odometry / SLAM ap-

proaches. Thus often only qualitative results are presented,

with the notable exception of laser-based SLAM [

28]. We

believe a fair comparison is possible in our benchmark due

to its large scale nature as well as the novel metrics we pro-

pose, which capture different sources of error by evaluating

error statistics over all sub-sequences of a given trajectory

length or driving speed.

Our 3D object benchmark focuses on computer vision

algorithms for object detection and 3D orientation estima-

tion. While existing benchmarks for those tasks do not pro-

vide accurate 3D information [

17, 39, 15, 16] or lack real-

ism [

33, 31, 34], our dataset provides accurate 3D bounding

boxes for object classes such as cars, vans, trucks, pedes-

trians, cyclists and trams. We obtain this information by

manually labeling objects in 3D point clouds produced by

our Velodyne system, and projecting them back into the im-

age. This results in tracklets with accurate 3D poses, which

can be used to asses the performance of algorithms for 3D

orientation estimation and 3D tracking.

In our experiments, we evaluate a representative set of

state-of-the-art systems using our benchmarks and novel

metrics. Perhaps not surprisingly, many algorithms that

do well on established datasets such as Middlebury [

41, 2]

struggle on our benchmark. We conjecture that this might

be due to their assumptions which are violated in our sce-

narios, as well as overﬁtting to a small set of training (test)

images.

In addition to the benchmarks, we provide MAT-

LAB/C++ development kits for easy access. We also main-

tain an up-to-date online evaluation server

. We hope that

our efforts will help increase the impact that visual recogni-

tion systems have in robotics applications.

2. Challenges and Methodology

Generating large-scale and realistic evaluation bench-

marks for the aforementioned tasks poses a number of chal-

lenges, including the collection of large amounts of data in

real time, the calibration of diverse sensors working at dif-

ferent rates, the generation of ground truth minimizing the

amount of supervision required, the selection of the appro-

www.cvlibs.net/datasets/kitti

priate sequences and frames for each benchmark as well as

the development of metrics for each task. In this section we

discuss how we tackle these challenges.

2.1. Sensors and Data Acquisition

We equipped a standard station wagon with two color

and two grayscale PointGrey Flea2 video cameras (10 Hz,

resolution: 1392 × 512 pixels, opening: 90

◦

× 35

◦

), a Velo-

dyne HDL-64E 3D laser scanner (10 Hz, 64 laser beams,

range: 100 m), a GPS/IMU localization unit with RTK cor-

rection signals (open sky localization errors < 5 cm) and a

powerful computer running a real-time database [

22].

We mounted all our cameras (i.e., two units, each com-

posed of a color and a grayscale camera) on top of our vehi-

cle. We placed one unit on the left side of the rack, and the

other on the right side. Our camera setup is chosen such

that we obtain a baseline of roughly 54 cm between the

same type of cameras and that the distance between color

and grayscale cameras is minimized (6 cm). We believe

this is a good setup since color images are very useful for

tasks such as segmentation and object detection, but provide

lower contrast and sensitivity compared to their grayscale

counterparts, which is of key importance in stereo matching

and optical ﬂow estimation.

We use a Velodyne HDL-64E unit, as it is one of the few

sensors available that can provide accurate 3D information

from moving platforms. In contrast, structured-light sys-

tems such as the Microsoft Kinect do not work in outdoor

scenarios and have a very limited sensing range. To com-

pensate egomotion in the 3D laser measurements, we use

the position information from our GPS/IMU system.

2.2. Sensor Calibration

Accurate sensor calibration is key for obtaining reliable

ground truth. Our calibration pipeline proceeds as follows:

First, we calibrate the four video cameras intrinsically and

extrinsically and rectify the input images. We then ﬁnd the

3D rigid motion parameters which relate the coordinate sys-

tem of the laser scanner, the localization unit and the refer-

ence camera. While our Camera-to-Camera and GPS/IMU-

to-Velodyne registration methods are fully automatic, the

Velodyne-to-Camera calibration requires the user to manu-

ally select a small number of correspondences between the

laser and the camera images. This was necessary as existing

techniques for this task are not accurate enough to compute

ground truth estimates.

Camera-to-Camera calibration. To automatically cali-

brate the intrinsic and extrinsic parameters of the cameras,

we mounted checkerboard patterns onto the walls of our

garage and detect corners in our calibration images. Based

on gradient information and discrete energy-minimization,

we assign corners to checkerboards, match them between

剩余7页未读，继续阅读

评论收藏

内容反馈

寒墨阁

粉丝: 5101
资源: 33

KITTI数据集评价标准paper

KITTI数据集评价标准计算代码

kitti数据集的标注文件

KITTI数据集真值处理

KITTI数据集00序列times.txt文件

KITTI数据集之点云建图

kitti数据集百度网盘地址

data_odometry_gray数据集

KITTI自动驾驶数据集.zip

kitti数据pose

KITTI数据集图像序列转换成rosbag文件

可爱的kitty鼠标指针

Kitti数据集下载-附件资源

hellokitty鼠标指针

HelloKitty的可爱鼠标指针

PJBlog2 Rabbit & Kitty

hellokitty桌面软件

hello Kitty可爱kitty猫PPT模板.pptx

使用html画出hello kitty代码（附带详细注释）

KiTTY 0.62.1.2 绿色版 PuTTY的一个分支版本

gmapping测试数据包

python画hellokitty代码

kitti360Scripts

Kitty-ssh tool

hellokitty主题餐厅管理系统

Discuz! HelloKitty

HelloKitty

迷你版HelloKitty时钟支持整点提示、定时提示。

简约kitty_sign.apk

08年 HELLO KITTY日历桌面.rar

最新资源