# bottom-up-attention.pytorch
This repository contains a **PyTorch** reimplementation of the [bottom-up-attention](https://github.com/peteanderson80/bottom-up-attention) project based on *Caffe*.
We use [Detectron2](https://github.com/facebookresearch/detectron2) as the backend to provide completed functions including training, testing and feature extraction. Furthermore, we migrate the pre-trained Caffe-based model from the original repository which can extract **the same visual features** as the original model (with deviation < 0.01).
Some example object and attribute predictions for salient image regions are illustrated below. The script to obtain the following visualizations can be found [here](utils/visualize.ipynb)
![example-image](datasets/demo/example_image.jpg?raw=true)
## Table of Contents
0. [Prerequisites](#Prerequisites)
1. [Training](#Training)
2. [Testing](#Testing)
3. [Feature Extraction](#Feature-Extraction)
4. [Pre-trained models](#Pre-trained-models)
## Prerequisites
#### Requirements
- [Python](https://www.python.org/downloads/) >= 3.6
- [PyTorch](http://pytorch.org/) >= 1.4
- [Cuda](https://developer.nvidia.com/cuda-toolkit) >= 9.2 and [cuDNN](https://developer.nvidia.com/cudnn)
- [Apex](https://github.com/NVIDIA/apex.git)
- [Detectron2](https://github.com/facebookresearch/detectron2)
- [Ray](https://github.com/ray-project/ray)
- [OpenCV](https://opencv.org/)
- [Pycocotools](https://github.com/cocodataset/cocoapi)
Note that most of the requirements above are needed for Detectron2.
#### Installation
1. Clone the project including the required version (v0.2.1) of Detectron2
```bash
# clone the repository inclduing Detectron2(@be792b9)
$ git clone --recursive https://github.com/MILVLG/bottom-up-attention.pytorch
```
2. Install Detectron2
```bash
$ cd detectron2
$ pip install -e .
$ cd ..
```
**We recommend using Detectron2 v0.2.1 (@be792b9) as backend for this project, which has been cloned in step 1. We believe a newer Detectron2 version is also compatible with this project unless their interface has been changed (we have tested v0.3 with PyTorch 1.5).**
3. Compile the rest tools using the following script:
```bash
# install apex
$ git clone https://github.com/NVIDIA/apex.git
$ cd apex
$ python setup.py install
$ cd ..
# install the rest modules
$ python setup.py build develop
$ pip install ray
```
#### Setup
If you want to train or test the model, you need to download the images and annotation files of the Visual Genome (VG) dataset. **If you only need to extract visual features using the pre-trained model, you can skip this part**.
The original VG images ([part1](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip) and [part2](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip)) are to be downloaded and unzipped to the `datasets` folder.
The generated annotation files in the original repository are needed to be transformed to a COCO data format required by Detectron2. The preprocessed annotation files can be downloaded [here](https://awma1-my.sharepoint.com/:u:/g/personal/yuz_l0_tn/EWpiE_5PvBdKiKfCi0pBx_EB5ONo8D8XABUz7tWcnltCrw?e=xIeW23) and unzipped to the `dataset` folder.
Finally, the `datasets` folders will have the following structure:
```angular2html
|-- datasets
|-- vg
| |-- images
| | |-- VG_100K
| | | |-- 2.jpg
| | | |-- ...
| | |-- VG_100K_2
| | | |-- 1.jpg
| | | |-- ...
| |-- annotations
| | |-- train.json
| | |-- val.json
```
## Training
The following script will train a bottom-up-attention model on the `train` split of VG. *We are still working on this part to reproduce the same results as the Caffe version*.
```bash
$ python3 train_net.py --mode detectron2 \
--config-file configs/bua-caffe/train-bua-caffe-r101.yaml \
--resume
```
1. `mode = {'caffe', 'detectron2'}` refers to the used mode. We only support the mode with Detectron2, which refers to `detectron2` mode, since we think it is unnecessary to train a new model using the `caffe` mode.
2. `config-file` refers to all the configurations of the model.
3. `resume` refers to a flag if you want to resume training from a specific checkpoint.
## Testing
Given the trained model, the following script will test the performance on the `val` split of VG:
```bash
$ python3 train_net.py --mode caffe \
--config-file configs/bua-caffe/test-bua-caffe-r101.yaml \
--eval-only
```
1. `mode = {'caffe', 'detectron2'}` refers to the used mode. For the converted model from Caffe, you need to use the `caffe` mode. For other models trained with Detectron2, you need to use the `detectron2` mode.
2. `config-file` refers to all the configurations of the model, which also include the path of the model weights.
3. `eval-only` refers to a flag to declare the testing phase.
## Feature Extraction
With highly-optimized multi-process parallelism, the following script will extract the bottom-up-attention visual features in a fast manner (about 7 imgs/s on a workstation with 4 Titan-V GPUs and 32 CPU cores).
And we also provide a [faster version](extract_features_faster.py) of the script of extract features, which will extract the bottom-up-attention visual features in **an extremely fast manner!** (about 16 imgs/s on a workstation with 4 Titan-V GPUs and 32 cores) However, it has a drawback that it could cause memory leakage problem when the computing capability of GPUs and CPUs is mismatched (More details and some matched examples in [here](https://github.com/MILVLG/bottom-up-attention.pytorch/pull/41)).
To use this faster version, just replace 'extract_features.py' with 'extract_features_faster.py' in the following script. **MAKE SURE YOU HAVE ENOUGH CPUS.**
```bash
$ python3 extract_features.py --mode caffe \
--num-cpus 32 --gpus '0,1,2,3' \
--extract-mode roi_feats \
--min-max-boxes '10,100' \
--config-file configs/bua-caffe/extract-bua-caffe-r101.yaml \
--image-dir <image_dir> --bbox-dir <out_dir> --out-dir <out_dir>
```
1. `mode = {'caffe', 'detectron2'}` refers to the used mode. For the converted model from Caffe, you need to use the `caffe` mode. For other models trained with Detectron2, you need to use the `detectron2` mode. `'caffe'` is the default value.
2. `num-cpus` refers to the number of cpu cores to use for accelerating the cpu computation. **0** stands for using all possible cpus and **1** is the default value.
3. `gpus` refers to the ids of gpus to use. **'0'** is the default value.
4. `config-file` refers to all the configurations of the model, which also include the path of the model weights.
5. `extract-mode` refers to the modes for feature extraction, including {`roi_feats`, `bboxes` and `bbox_feats`}.
6. `min-max-boxes` refers to the min-and-max number of features (boxes) to be extracted.
7. `image-dir` refers to the input image directory.
8. `bbox-dir` refers to the pre-proposed bbox directory. Only be used if the `extract-mode` is set to `'bbox_feats'`.
9. `out-dir` refers to the output feature directory.
Using the same pre-trained model, we provide an alternative *two-stage* strategy for extracting visual features, which results in (slightly) more accurate bboxes and visual features:
```bash
# extract bboxes only:
$ python3 extract_features.py --mode caffe \
--num-cpus 32 --gpu '0,1,2,3' \
--extract-mode bboxes \
--config-file configs/bua-caffe/extract-bua-caffe-r101.yaml \
--image-dir <image_dir> --out-dir <out_dir> --resume
# extract visual features with the pre-extracted bboxes:
$ python3 extract_features.py --mode caffe \
--num-cpus 32 --gpu '0,1,2,3' \
--extract-mode bbox_feats \
--config-file configs/bua-caffe/extract-bua-caffe-r101.yaml \
--image-dir <image_dir> --bbox-dir <bbox_dir> --out-dir <out_dir> --resume
没有合适的资源?快使用搜索试试~ 我知道了~
bottom-up-attention.pytorch:自下而上的注意力模型的PyTorch重新实现
共60个文件
py:33个
yaml:6个
jpg:6个
5星 · 超过95%的资源 需积分: 31 13 下载量 73 浏览量
2021-05-10
07:59:32
上传
评论
收藏 1.88MB ZIP 举报
温馨提示
自下而上的注意力 该存储库包含基于Caffe的项目的PyTorch重新实现。 我们使用作为后端来提供完整的功能,包括培训,测试和特征提取。 此外,我们从原始存储库中迁移了经过预训练的基于Caffe的模型,该模型可以提取与原始模型相同的视觉特征(偏差<0> = 3.6 > = 1.4 > = 9.2和cuDNN 顶尖 侦探2 射线 OpenCV Pycocotools 请注意,Detectron2需要上述大多数要求。 安装 克隆包含Detectron2所需版本(v0.2.1)的项目 # clone the repository inclduing Detectron2(@be792b9) $ git clone --recursive https:
资源详情
资源评论
资源推荐
收起资源包目录
bottom-up-attention_pytorch-master.zip (60个子文件)
bottom-up-attention.pytorch-master
utils
utils.py 2KB
__init__.py 32B
extract_utils.py 7KB
visualize.ipynb 724KB
progress_bar.py 5KB
extractor.py 3KB
.gitmodules 98B
models
__init__.py 325B
bua
layers
wrappers.py 1KB
nms.py 3KB
csrc
__init__.py 183B
nms
vision_cuda.h 337B
vision_cpu.h 594B
nms.cu 5KB
nms_cpu.cpp 2KB
nms.h 716B
vision.cpp 287B
config.py 1KB
roi_heads.py 21KB
backbone.py 9KB
fast_rcnn.py 27KB
rpn_outputs.py 18KB
__init__.py 224B
rpn.py 8KB
rcnn.py 7KB
box_regression.py 8KB
postprocessing.py 2KB
configs
bua-caffe
test-bua-caffe-r101-fix36.yaml 1KB
test-bua-caffe-r101.yaml 1KB
extract-bua-caffe-r101.yaml 1KB
extract-bua-caffe-r152.yaml 1KB
test-bua-caffe-r152.yaml 1KB
extract-bua-caffe-r101-fix36.yaml 1KB
opts.py 1KB
train_net.py 2KB
evaluation
attributes_vocab.txt 3KB
objects_vocab.txt 12KB
__init__.py 38B
vg_eval.py 5KB
vg_evaluation.py 12KB
extract_features.py 9KB
datasets
init 0B
demo
001763.jpg 72KB
000542.jpg 113KB
000456.jpg 103KB
001150.jpg 87KB
example_image.jpg 181KB
example_image2.png 322KB
example_image1.png 322KB
004545.jpg 120KB
LICENSE 11KB
dataloader
dataset_vg.py 821B
__init__.py 155B
detection_utils.py 3KB
dataset_mapper.py 6KB
load_vg_json.py 8KB
transform_gen.py 2KB
setup.py 2KB
README.md 10KB
detectron2
extract_features_faster.py 9KB
共 60 条
- 1
crazed1987
- 粉丝: 35
- 资源: 4678
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论1