<div align="center">
<h1>ð¤ HE-Drive</h1>
<h2> Human-Like End-to-End Driving with Vision Language Models</h2> <br>
<strong>We will open source the complete code after the paper is accepted ï¼</strong> <br><br>
<a href='https://arxiv.org/abs/2410.05051'><img src='https://img.shields.io/badge/arXiv-HE_Drive-green' alt='arxiv'></a>
<a href='https://jmwang0117.github.io/HE-Drive/'><img src='https://img.shields.io/badge/Project_Page-HE_Drive-green' alt='Project Page'></a>
</div>
## ð¢ News
- [2024/10.08]: ð¥ We release the HE-Drive paper on arXiv !
</br>
## ð Introduction
**HE-Drive** is a groundbreaking end-to-end autonomous driving system that prioritizes human-like driving characteristics, ensuring both temporal consistency and comfort in generated trajectories. By leveraging sparse perception for key 3D spatial representations, a DDPM-based motion planner for generating multi-modal trajectories, and a VLM-guided trajectory scorer for selecting the most comfortable option, HE-Drive sets a new standard in autonomous driving performance and efficiency. This innovative approach not only significantly reduces collision rates and improves computational speed compared to existing solutions but also delivers the most comfortable driving experience based on real-world data.
<p align="center">
<img src="misc/overview.png" width = 100% height = 100%/>
</p>
<br>
<p align="center">
<img src="misc/scoring.png" width = 100% height = 100%/>
</p>
<br>
## ð Citing
```
@article{wang2024he,
title={HE-Drive: Human-Like End-to-End Driving with Vision Language Models},
author={Wang, Junming and Zhang, Xingyu and Xing, Zebin and Gu, Songen and Guo, Xiaoyang and Hu, Yang and Song, Ziying and Zhang, Qian and Long, Xiaoxiao and Yin, Wei},
journal={arXiv preprint arXiv:2410.05051},
year={2024}
}
```
Please kindly star âï¸ this project if it helps you. We take great efforts to develop and maintain it ð.
## ð ï¸ Installation
> [!NOTE]
> Installation steps follow [SparseDrive](https://github.com/swc-17/SparseDrive)
### Set up a new virtual environment
```bash
conda create -n hedrive python=3.8 -y
conda activate hedrive
```
### Install dependency packpages
```bash
hedrive_path="path/to/hedrive"
cd ${hedrive_path}
pip3 install --upgrade pip
pip3 install torch==1.13.0+cu116 torchvision==0.14.0+cu116 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu116
pip3 install -r requirement.txt
```
### Compile the deformable_aggregation CUDA op
```bash
cd projects/mmdet3d_plugin/ops
python3 setup.py develop
cd ../../../
```
### Prepare the data
Download the [NuScenes dataset](https://www.nuscenes.org/nuscenes#download) and CAN bus expansion, put CAN bus expansion in /path/to/nuscenes, create symbolic links.
```bash
cd ${hedrive_path}
mkdir data
ln -s path/to/nuscenes ./data/nuscenes
```
Pack the meta-information and labels of the dataset, and generate the required pkl files to data/infos. Note that we also generate map_annos in data_converter, with a roi_size of (30, 60) as default, if you want a different range, you can modify roi_size in tools/data_converter/nuscenes_converter.py.
```bash
sh scripts/create_data.sh
```
### Prepare the 3D representation
> [!NOTE]
> Generate 3D representation using SparseDrive second stage checkpoint!
### Commence training
```bash
# train
sh scripts/train.sh
```
### Install Ollama and Llama 3.2-Vision 11B
> [!NOTE]
> Download Ollama 0.4, then run:
```bash
ollama run llama3.2-vision-11b
```
> [!IMPORTANT]
> Llama 3.2 Vision 11B requires least 8GB of VRAM.
>
> Please prepare at least 10 sets of VQA templates to complete the dialogue, focusing the llama knowledge domain on driving style assessment.
### Commence testing
```bash
# test
sh scripts/test.sh
```
## ð½ Dataset
- [x] nuScenes
- [x] Real-World Data
- [x] OpenScene/NAVSIM
## ð Acknowledgement
Many thanks to these excellent open source projects:
- [SparseDrive](https://github.com/swc-17/SparseDrive)
- [DP](https://github.com/real-stanford/diffusion_policy)
- [DP3](https://github.com/YanjieZe/3D-Diffusion-Policy)
- [OpenScene](https://github.com/OpenDriveLab/OpenScene)
- [NAVSIM](https://github.com/autonomousvision/navsim)
没有合适的资源?快使用搜索试试~ 我知道了~
HE-Drive-main.zip

共164个文件
py:87个
pyc:57个
sh:7个

需积分: 0 0 下载量 40 浏览量
更新于2024-12-25
收藏 3.96MB ZIP 举报
HE-Drive-main.zip
收起资源包目录





































































































共 164 条
- 1
- 2
资源推荐
资源预览
资源评论
2018-06-22 上传

2020-03-10 上传
2023-12-15 上传
2025-01-25 上传
183 浏览量
2020-08-03 上传
117 浏览量
2020-03-26 上传


187 浏览量
194 浏览量

165 浏览量

147 浏览量

177 浏览量
2020-09-03 上传
108 浏览量
2020-07-07 上传
178 浏览量
2013-05-16 上传
193 浏览量
154 浏览量
资源评论


weixin_43946154
- 粉丝: 0
- 资源: 25
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助


最新资源
- deepseek 与 ChatGPT 的比较.pdf
- 开关电源变压器设计-卢经纬.pdf
- DeepSeek-VL2:用于高级多模态理解的专家混合视觉语言模型.pdf
- DeepSeek 提示词编写技巧.pdf
- MAME模拟器二进制软件
- DeepSeek的启示:地方如何培育创新.pdf
- DeepSeek官方服务器无法使用的替代方案指南.pdf
- DeepSeek常用高级指令 -60个 保姆级指令.pdf
- Deepseek满血版私用部署手把手教程.pdf
- DeepSeek强势崛起:AI创新狂潮下的安全警钟.pdf
- DeepSeek如何赋能职场应用?——从提示语技巧到多场景应用.pdf
- deepseek私域部署指南 -应用-接入-部署大全.pdf
- DeepSeek行业级应用白皮书 精准数据洞察与自动化效能提升方法论.pdf
- DeepSeek行业应用案例集:解锁智能变革密码.pdf
- DeepSeek与AI幻觉研究报告.pdf
- 一文读懂MongoDB之单机模式搭建
安全验证
文档复制为VIP权益,开通VIP直接复制
