<div align="center">
<h1>ð¤ HE-Drive</h1>
<h2> Human-Like End-to-End Driving with Vision Language Models</h2> <br>
<strong>We will open source the complete code after the paper is accepted ï¼</strong> <br><br>
<a href='https://arxiv.org/abs/2410.05051'><img src='https://img.shields.io/badge/arXiv-HE_Drive-green' alt='arxiv'></a>
<a href='https://jmwang0117.github.io/HE-Drive/'><img src='https://img.shields.io/badge/Project_Page-HE_Drive-green' alt='Project Page'></a>
</div>
## ð¢ News
- [2024/10.08]: ð¥ We release the HE-Drive paper on arXiv !
</br>
## ð Introduction
**HE-Drive** is a groundbreaking end-to-end autonomous driving system that prioritizes human-like driving characteristics, ensuring both temporal consistency and comfort in generated trajectories. By leveraging sparse perception for key 3D spatial representations, a DDPM-based motion planner for generating multi-modal trajectories, and a VLM-guided trajectory scorer for selecting the most comfortable option, HE-Drive sets a new standard in autonomous driving performance and efficiency. This innovative approach not only significantly reduces collision rates and improves computational speed compared to existing solutions but also delivers the most comfortable driving experience based on real-world data.
<p align="center">
<img src="misc/overview.png" width = 100% height = 100%/>
</p>
<br>
<p align="center">
<img src="misc/scoring.png" width = 100% height = 100%/>
</p>
<br>
## ð Citing
```
@article{wang2024he,
title={HE-Drive: Human-Like End-to-End Driving with Vision Language Models},
author={Wang, Junming and Zhang, Xingyu and Xing, Zebin and Gu, Songen and Guo, Xiaoyang and Hu, Yang and Song, Ziying and Zhang, Qian and Long, Xiaoxiao and Yin, Wei},
journal={arXiv preprint arXiv:2410.05051},
year={2024}
}
```
Please kindly star âï¸ this project if it helps you. We take great efforts to develop and maintain it ð.
## ð ï¸ Installation
> [!NOTE]
> Installation steps follow [SparseDrive](https://github.com/swc-17/SparseDrive)
### Set up a new virtual environment
```bash
conda create -n hedrive python=3.8 -y
conda activate hedrive
```
### Install dependency packpages
```bash
hedrive_path="path/to/hedrive"
cd ${hedrive_path}
pip3 install --upgrade pip
pip3 install torch==1.13.0+cu116 torchvision==0.14.0+cu116 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu116
pip3 install -r requirement.txt
```
### Compile the deformable_aggregation CUDA op
```bash
cd projects/mmdet3d_plugin/ops
python3 setup.py develop
cd ../../../
```
### Prepare the data
Download the [NuScenes dataset](https://www.nuscenes.org/nuscenes#download) and CAN bus expansion, put CAN bus expansion in /path/to/nuscenes, create symbolic links.
```bash
cd ${hedrive_path}
mkdir data
ln -s path/to/nuscenes ./data/nuscenes
```
Pack the meta-information and labels of the dataset, and generate the required pkl files to data/infos. Note that we also generate map_annos in data_converter, with a roi_size of (30, 60) as default, if you want a different range, you can modify roi_size in tools/data_converter/nuscenes_converter.py.
```bash
sh scripts/create_data.sh
```
### Prepare the 3D representation
> [!NOTE]
> Generate 3D representation using SparseDrive second stage checkpoint!
### Commence training
```bash
# train
sh scripts/train.sh
```
### Install Ollama and Llama 3.2-Vision 11B
> [!NOTE]
> Download Ollama 0.4, then run:
```bash
ollama run llama3.2-vision-11b
```
> [!IMPORTANT]
> Llama 3.2 Vision 11B requires least 8GB of VRAM.
>
> Please prepare at least 10 sets of VQA templates to complete the dialogue, focusing the llama knowledge domain on driving style assessment.
### Commence testing
```bash
# test
sh scripts/test.sh
```
## ð½ Dataset
- [x] nuScenes
- [x] Real-World Data
- [x] OpenScene/NAVSIM
## ð Acknowledgement
Many thanks to these excellent open source projects:
- [SparseDrive](https://github.com/swc-17/SparseDrive)
- [DP](https://github.com/real-stanford/diffusion_policy)
- [DP3](https://github.com/YanjieZe/3D-Diffusion-Policy)
- [OpenScene](https://github.com/OpenDriveLab/OpenScene)
- [NAVSIM](https://github.com/autonomousvision/navsim)
HE-Drive-main.zip
需积分: 0 65 浏览量
更新于2024-12-25
收藏 3.96MB ZIP 举报
HE-Drive-main.zip
weixin_43946154
- 粉丝: 0
- 资源: 22
最新资源
- CRUISE纯电动车仿真模型,实际项目base模型 simulink DLL联合仿真,基于标定的map模型,适用于vcu+esp实现能量回收的项目 关于模型: 1.策略是用64位软件编译的,如果模
- 全套S7-1200一拖三恒压供水程序样例+PID样例+触摸屏样例 34 1、此程序采用S7-1200PLC和KTP1000PN触摸屏人机执行PID控制变频器实现恒压供水. 包括plc程序,触摸屏
- SOMBP预测模型,数据可以多输入单输出做拟合预测模型,直接替数据就可以使用,程序内有注释,可学习性强,可除两种拟合预测图,以及多种模型评价指标
- Matlab simulink仿真的直流配电网,图2为下垂控制仿真模型,图3为流器(VSC)仿真模型,有这完美的电压与电流波形,两种VSC的有功功率与下垂控制的有功功率,输出电压波形
- 西门子1500PLC机器人焊接程序(西门子PLC+西门子触摸屏) 触摸屏:TP1500 精智面板 PLC:CPU 1516F-3 PN DP 程序:梯形图+SCL PS:注释详细 1台西门子1500P
- 基于WinCE6.0 + Visual Studio2008(VC++开发) + Googol固高codesys运动控制器,开发的示教控制系统 操作者可以通过简单的选择、参数设定而实现相对、绝对定位
- 恒压供水plc程序,1拖1十1辅泵,1拖2十1至1拖4十1辅泵,水箱,无负压通用,有完整的图纸和注释,使用三菱FX1N.2N系列plc十fx0n3a模拟量十昆仑通态tpc7062触摸屏,适合参考学习
- 量产大厂成熟FOC电机控制方案,代码 大厂成熟Foc电机控 码,有原理图,pcb 可用于电动自行车,滑板车,电机Foc控制等 大厂成熟方案,直接可用,,不是一般的普通代码可比的 代码基于st
- 基于遗传算法的车间调度 已知加工时间,如何确定加工顺序和工件分配情况,使得最大完工时间极小化 内涵详细的代码注释
- matlab模型降级算法,传递函数降阶算法 电机控制,并网控制,四旋翼控制等 高阶传递函数进行降级阶处理,逼近传递函数n阶矩阵的距,实现模型降级,操作简单 (有arnolid算法、lanczos
- starccm+电池包热管理-新能源汽车电池包共轭传热仿真 可查學習模型如何搭建,几何清理网格划分,學習重要分析参数如何设置 内容: 0.电池包热管理基础知识讲解,电芯发热机理,电池热管理系统介绍
- 药厂BMS、EMS PLC程序,含触摸屏程序,很有借鉴意义 大型药厂在运行程序; 控制器用的是西门子1500; 里面运用的结构化编程思路很值得借鉴; 药厂各种控制模式; 控温控湿控压; 里面包含数据滤
- 西门子v90伺服与G120 变频pLC控制程序博途Ⅴ14 V15 V16 Ⅴ17版 Cpu为1217,触摸屏为KTp700,4台v90和两台G120釆用PN通讯模式,自动上料机程序 有视屏教程
- matlab simulink 二次调频,4机2区系统二次调频,用模型方法对四机两区系统进行了二次调频分析,有以下两点内容, 1.传统同步机二次调频特性分析 2.用水电风电替系统同步机之后的调频特性
- Matlab使用CNN卷积神经网络进行图像分类,使用了猫狗大战数据集的4000个图像(2000猫2000狗),分为猫狗两个类别 也可以改成多分类 注释详细,可直接运行,可以直接成自己的数据,源代码
- Matlab代码模板,图像处理,色彩补偿,色彩平衡,显示连通分量数量,自动阈值分割图像,人脸数据集的主成分分析,利用最小距离分类器分类3种植物,