# Deep Image Retrieval
This project contains the models and the evaluation scripts (in Python3 and Pytorch 1.0+) of the papers:
**[1] End-to-end Learning of Deep Visual Representations for Image Retrieval**
Albert Gordo, Jon Almazan, Jerome Revaud, Diane Larlus, IJCV 2017 [\[PDF\]](https://arxiv.org/abs/1610.07940)
**[2] Learning with Average Precision: Training Image Retrieval with a Listwise Loss**
Jerome Revaud, Jon Almazan, Rafael S. Rezende, Cesar de Souza, ICCV 2019 [\[PDF\]](https://arxiv.org/abs/1906.07589)
Both papers tackle the problem of image retrieval and explore different ways to learn deep visual representations for this task. In both cases, a CNN is used to extract a feature map that is aggregated into a compact, fixed-length representation by a global-aggregation layer*. Finally, this representation is first projected using a FC layer, and L2 normalized so images can be efficiently compared with the dot product.
![dir_network](https://user-images.githubusercontent.com/228798/59742085-aae19f80-9221-11e9-8063-e5f2528c304a.png)
All components in this network, including the aggregation layer, are differentiable, which makes it end-to-end trainable for the end task. In [1], a Siamese architecture that combines three streams with a triplet loss was proposed to train this network. In [2], this work was extended by replacing the triplet loss with a new loss that directly optimizes for Average Precision.
![Losses](https://user-images.githubusercontent.com/228798/59742025-7a9a0100-9221-11e9-9d58-1494716e9071.png)
\* Originally, [1] used R-MAC pooling [3] as the global-aggregation layer. However, due to its efficiency and better performace we have replaced the R-MAC pooling layer with the Generalized-mean pooling layer (GeM) proposed in [4]. You can find the original implementation of [1] in Caffe following [this link](https://europe.naverlabs.com/Research/Computer-Vision/Learning-Visual-Representations/Deep-Image-Retrieval/).
## Pre-requisites
In order to run this toolbox you will need:
- Python3 (tested with Python 3.7.3)
- PyTorch (tested with version 1.4)
- The following packages: numpy, matplotlib, tqdm, scikit-learn
With conda you can run the following commands:
```
conda install numpy matplotlib tqdm scikit-learn
conda install pytorch torchvision -c pytorch
```
## Installation
```
# Download the code
# Create env variables
# cd the project
export DIR_ROOT=$PWD
export DB_ROOT=/PATH/TO/YOUR/DATASETS
# for example: export DB_ROOT=$PWD/dirtorch/data/datasets
```
## Evaluation
### Pre-trained models
The table below contains the pre-trained models that we provide with this library, together with their mAP performance on some of the most well-know image retrieval benchmakrs: [Oxford5K](http://www.robots.ox.ac.uk/~vgg/data/oxbuildings/), [Paris6K](http://www.robots.ox.ac.uk/~vgg/data/parisbuildings/), and their Revisited versions ([ROxford5K and RParis6K](https://github.com/filipradenovic/revisitop)).
| Model | Oxford5K | Paris6K | ROxford5K (med/hard) | RParis6K (med/hard) |
|--- |:-:|:-:|:-:|:-:|
| [Resnet101-TL-MAC](https://drive.google.com/file/d/13MUGNwn_CYGZvqDBD8FGD8fVYxThsSDg/view?usp=sharing) | 85.6 | 90.1 | 63.3 / 35.7 | 76.6 / 55.5 |
| [Resnet101-TL-GeM](https://drive.google.com/open?id=1vhm1GYvn8T3-1C4SPjPNJOuTU9UxKAG6) | 85.7 | **93.4** | 64.5 / 40.9 | 78.8 / 59.2 |
| [Resnet50-AP-GeM](https://drive.google.com/file/d/1oPtE_go9tnsiDLkWjN4NMpKjh-_md1G5/view?usp=sharing) | 87.7 | 91.9 | 65.5 / 41.0 | 77.6 / 57.1 |
| [Resnet101-AP-GeM](https://drive.google.com/open?id=1UWJGDuHtzaQdFhSMojoYVQjmCXhIwVvy) | **89.1** | **93.0** | **67.1** / **42.3** | **80.3**/**60.9** |
| [Resnet101-AP-GeM-LM18](https://drive.google.com/open?id=1r76NLHtJsH-Ybfda4aLkUIoW3EEsi25I)** | 88.1 | **93.1** | 66.3 / **42.5** | **80.2** / **60.8** |
The name of the model encodes the backbone architecture of the network and the loss that has been used to train it (TL for triplet loss and AP for Average Precision loss). All models use **Generalized-mean pooling (GeM)** [3] as the global pooling mechanism, except for the model in the first row that uses MAC [3] \(i.e. max-pooling), and have been trained on the **Landmarks-clean** [1] dataset (the clean version of the [Landmarks dataset](http://sites.skoltech.ru/compvision/projects/neuralcodes/)) directly **fine-tuning from ImageNet**. These numbers have been obtained using a **single resolution** and applying **whitening** to the output features (which has also been learned on Landmarks-clean). For a detailed explanation of all the hyper-parameters see [1] and [2] for the triplet loss and AP loss models, respectively.
** For the sake of completeness, we have added an extra model, `Resnet101-AP-LM18`, which has been trained on the [Google-Landmarks Dataset](https://www.kaggle.com/google/google-landmarks-dataset), a large dataset consisting of more than 1M images and 15K classes.
### Reproducing the results
The script `test_dir.py` can be used to evaluate the pre-trained models provided and to reproduce the results above:
```
python -m dirtorch.test_dir --dataset DATASET --checkpoint PATH_TO_MODEL \
[--whiten DATASET] [--whitenp POWER] [--aqe ALPHA-QEXP] \
[--trfs TRANSFORMS] [--gpu ID] [...]
```
- `--dataset`: selects the dataset (eg.: Oxford5K, Paris6K, ROxford5K, RParis6K) [**required**]
- `--checkpoint`: path to the model weights [**required**]
- `--whiten`: applies whitening to the output features [default 'Landmarks_clean']
- `--whitenp`: whitening power [default: 0.25]
- `--aqe`: alpha-query expansion parameters [default: None]
- `--trfs`: input image transformations (can be used to apply multi-scale) [default: None]
- `--gpu`: selects the GPU ID (-1 selects the CPU)
For example, to reproduce the results of the Resnet101-AP_loss model on the RParis6K dataset download the model `Resnet-101-AP-GeM.pt` from [here](https://drive.google.com/open?id=1mi50tG6oXY1eE9yJnmGCPdTmlIjG7mr0) and run:
```
cd $DIR_ROOT
export DB_ROOT=/PATH/TO/YOUR/DATASETS
python -m dirtorch.test_dir --dataset RParis6K \
--checkpoint dirtorch/data/Resnet101-AP-GeM.pt \
--whiten Landmarks_clean --whitenp 0.25 --gpu 0
```
And you should see the following output:
```
>> Evaluation...
* mAP-easy = 0.907568
* mAP-medium = 0.803098
* mAP-hard = 0.608556
```
**Note:** this script integrates an automatic downloader for the Oxford5K, Paris6K, ROxford5K, and RParis6K datasets (kudos to Filip Radenovic ;)). The datasets will be saved in `$DB_ROOT`.
## Feature extractor
You can also use the pre-trained models to extract features from your own datasets or collection of images. For that we provide the script `feature_extractor.py`:
```
python -m dirtorch.extract_features --dataset DATASET --checkpoint PATH_TO_MODEL \
--output PATH_TO_FILE [--whiten DATASET] [--whitenp POWER] \
[--trfs TRANSFORMS] [--gpu ID] [...]
```
where `--output` is used to specify the destination where the features will be saved. The rest of the parameters are the same as seen above.
For example, this is how the script can be used to extract a feature representation for each one of the images in the RParis6K dataset using the `Resnet-101-AP-GeM.pt` model, and storing them in `rparis6k_features.npy`:
```
cd $DIR_ROOT
export DB_ROOT=/PATH/TO/YOUR/DATASETS
python -m dirtorch.extract_features --dataset RParis6K \
--checkpoint dirtorch/data/Resnet101-AP-GeM.pt \
--output rparis6k_features.npy \
--whiten Landmarks_clean --whitenp 0.25 --gpu 0
```
The library also provides a **generic class dataset** (`ImageList`) that allows you to specify the list of images by providing a simple text file.
```
--dataset 'ImageList("PATH_TO_TEXTFILE" [, "IMAGES_ROOT"])'
```
Each row of the text file should contain a single path to a given image:
```
/PATH/TO/YOUR/DATASET/images/image1.jpg
/PATH/TO/YOUR/DATASET/images/image2.jpg
/PATH/TO/YOUR/DATASET/images/image3.jpg
/PATH/TO/YOUR/DATASET/images/image4.jpg
/PATH/TO/YOUR/
没有合适的资源?快使用搜索试试~ 我知道了~
图像检索-用于图像检索的深度视觉表示的端到端学习算法-附项目源码-优质项目实战.zip
共32个文件
py:31个
md:1个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 179 浏览量
2024-09-05
17:58:02
上传
评论
收藏 66KB ZIP 举报
温馨提示
图像检索_用于图像检索的深度视觉表示的端到端学习算法_附项目源码_优质项目实战
资源推荐
资源详情
资源评论
收起资源包目录
图像检索_用于图像检索的深度视觉表示的端到端学习算法_附项目源码_优质项目实战.zip (32个子文件)
图像检索_用于图像检索的深度视觉表示的端到端学习算法_附项目源码_优质项目实战
dirtorch
test_dir.py 10KB
loss.py 8KB
extract_features.py 5KB
utils
evaluation.py 3KB
funcs.py 383B
transforms.py 25KB
transforms_tools.py 8KB
common.py 7KB
convenient.py 4KB
pytorch_loader.py 10KB
nets
rmac_resnet.py 3KB
__init__.py 3KB
layers
pooling.py 2KB
rmac_resnet_fpn.py 4KB
__main__.py 125B
backbones
__init__.py 876B
resnext101_features.py 56KB
resnet.py 8KB
rmac_resnext.py 3KB
datasets
landmarks.py 827B
__init__.py 491B
landmarks18.py 3KB
oxford.py 541B
downloader.py 2KB
generic_func.py 2KB
dataset.py 20KB
__main__.py 2KB
paris.py 533B
create.py 922B
generic.py 10KB
extract_kapture.py 7KB
README.md 9KB
共 32 条
- 1
资源评论
极智视界
- 粉丝: 3w+
- 资源: 1746
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功