# SPTS: Single-Point Text Spotting
<div>
<a href="https://arxiv.org/abs/2112.07917">[arXiv paper]</a>
</div>
## Description
This is an implementation of [SPTS](https://github.com/shannanyinxiang/SPTS) based on [MMOCR](https://github.com/open-mmlab/mmocr/tree/dev-1.x), [MMCV](https://github.com/open-mmlab/mmcv), and [MMEngine](https://github.com/open-mmlab/mmengine).
Existing scene text spotting (i.e., end-to-end text detection and recognition) methods rely on costly bounding box annotations (e.g., text-line, word-level, or character-level bounding boxes). For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost annotation of a single-point for each instance. We propose an end-to-end scene text spotting method that tackles scene text spotting as a sequence prediction task. Given an image as input, we formulate the desired detection and recognition results as a sequence of discrete tokens and use an auto-regressive Transformer to predict the sequence. The proposed method is simple yet effective, which can achieve state-of-the-art results on widely used benchmarks. Most significantly, we show that the performance is not very sensitive to the positions of the point annotation, meaning that it can be much easier to be annotated or even be automatically generated than the bounding box that requires precise positions. We believe that such a pioneer attempt indicates a significant opportunity for scene text spotting applications of a much larger scale than previously possible.
<center>
<img src="https://user-images.githubusercontent.com/22607038/215685203-fbf2d00c-39d3-48bb-9d05-4fd28c56431c.png">
</center>
## Usage
<!-- For a typical model, this section should contain the commands for training and testing. You are also suggested to dump your environment specification to env.yml by `conda env export > env.yml`. -->
### Prerequisites
- Python 3.7
- PyTorch 1.6 or higher
- [MIM](https://github.com/open-mmlab/mim)
- [MMOCR](https://github.com/open-mmlab/mmocr)
All the commands below rely on the correct configuration of `PYTHONPATH`, which should point to the project's directory so that Python can locate the module files. In `SPTS/` root directory, run the following line to add the current directory to `PYTHONPATH`:
```shell
# Linux
export PYTHONPATH=`pwd`:$PYTHONPATH
# Windows PowerShell
$env:PYTHONPATH=Get-Location
```
### Dataset
As of now, the implementation uses datasets provided by SPTS for **pre-training**, and uses MMOCR's datasets for **fine-tuning and testing**. It's because the test split of SPTS's datasets does not contain enough information for e2e evaluation; and MMOCR's dataset preparer has not yet supported all the datasets used in SPTS. *We are working on this issue, and they will be available in MMOCR's dataset preparer very soon.*
Please follow these steps to prepare the datasets:
- Download and extract all the SPTS datasets into `spts-data/` following [SPTS official guide](https://github.com/shannanyinxiang/SPTS#dataset).
- Use [Dataset Preparer](https://mmocr.readthedocs.io/en/dev-1.x/user_guides/data_prepare/dataset_preparer.html) to prepare `icdar2013`, `icdar2015` and `totaltext` for `textspotting` task.
```shell
# Run in MMOCR's root directory
python tools/dataset_converters/prepare_dataset.py icdar2013 icdar2015 totaltext --task textspotting
```
Then create a soft link to `data/` directory in the project root directory:
```shell
ln -s ../../data/ .
```
### Training commands
In the current directory, run the following command to train the model:
#### Pretrain
```bash
mim train mmocr config/spts/spts_resnet50_8xb8-150e_pretrain-spts.py --work-dir work_dirs/ --amp
```
To train on multiple GPUs, e.g. 8 GPUs, run the following command:
```bash
mim train mmocr config/spts/spts_resnet50_8xb8-150e_pretrain-spts.py --work-dir work_dirs/ --launcher pytorch --gpus 8 --amp
```
#### Finetune
Similarly, run the following command to finetune the model on a dataset (e.g. icdar2013):
```bash
mim train mmocr config/spts/spts_resnet50_8xb8-200e_icdar2013.py --work-dir work_dirs/ --cfg-options "load_from={CHECKPOINT_PATH}" --amp
```
To finetune on multiple GPUs, e.g. 8 GPUs, run the following command:
```bash
mim train mmocr config/spts/spts_resnet50_8xb8-200e_icdar2013.py --work-dir work_dirs/ --launcher pytorch --gpus 8 --cfg-options "load_from={CHECKPOINT_PATH}" --amp
```
### Testing commands
In the current directory, run the following command to test the model on a dataset (e.g. icdar2013):
```bash
mim test mmocr config/spts/spts_resnet50_8xb8-200e_icdar2013.py --work-dir work_dirs/ --checkpoint ${CHECKPOINT_PATH}
```
## Convert Weights from Official Repo
Users may download the weights from [SPTS](https://github.com/shannanyinxiang/SPTS#inference) and use the conversion script to convert them into MMOCR format.
```bash
python tools/ckpt_adapter.py [SPTS_WEIGHTS_PATH] [MMOCR_WEIGHTS_PATH]
```
## Results
All the models are trained on 8x A100 GPUs with AMP on (`--amp`). The overall batch size is 64.
| Name | Pretrained | Generic | Weak | Strong | Download |
| ---------- | --------------------------------------------------------------------------------------- | ------- | ----- | ------ | ------------------------------------------------------------------------------------- |
| ICDAR 2013 | [model](https://download.openmmlab.com/mmocr/textspotting/spts/spts_resnet50_150e_pretrain-spts/spts_resnet50_150e_pretrain-spts-c9fe4c78.pth) / [log](https://download.openmmlab.com/mmocr/textspotting/spts/spts_resnet50_150e_pretrain-spts/20230223_194550.log) | 87.10 | 91.46 | 93.41 | [model](https://download.openmmlab.com/mmocr/textspotting/spts/spts_resnet50_200e_icdar2013/spts_resnet50_200e_icdar2013-64cb4d31.pth) / [log](https://download.openmmlab.com/mmocr/textspotting/spts/spts_resnet50_200e_icdar2013/20230303_140316.log) |
| ICDAR 2015 | [model](https://download.openmmlab.com/mmocr/textspotting/spts/spts_resnet50_150e_pretrain-spts/spts_resnet50_150e_pretrain-spts-c9fe4c78.pth) / [log](https://download.openmmlab.com/mmocr/textspotting/spts/spts_resnet50_150e_pretrain-spts/20230223_194550.log) | 69.09 | 73.45 | 79.19 | [model](https://download.openmmlab.com/mmocr/textspotting/spts/spts_resnet50_200e_icdar2015/spts_resnet50_200e_icdar2015-d6e8621c.pth) / [log](https://download.openmmlab.com/mmocr/textspotting/spts/spts_resnet50_200e_icdar2015/20230302_230026.log) |
| Name | Pretrained | None-Hmean | Full-Hmean | Download |
| :-------: | -------------------------------------------------------------------------------------- | :--------: | :--------: | ------------------------------------------------------------------------------------- |
| Totaltext | [model](https://download.openmmlab.com/mmocr/textspotting/spts/spts_resnet50_150e_pretrain-spts/spts_resnet50_150e_pretrain-spts-c9fe4c78.pth) / [log](https://download.openmmlab.com/mmocr/textspotting/spts/spts_resnet50_150e_pretrain-spts/20230223_194550.log) | 73.99 | 82.34 | [model](https://download.openmmlab.com/mmocr/textspotting/spts/spts_resnet50_200e_totaltext/spts_resnet50_200e_totaltext-e3521af6.pth) / [log](https://download.openmmlab.com/mmocr/textspotting/spts/spts_resnet50_200e_totaltext/20230303_103040.log) |
## Citation
If you find SPTS useful in your research or applications, please cite SPTS with the following BibTeX entry.
```BibTeX
@inproceedings{peng2022spts,
title={SPTS: Single-Point Text Spotting},
author={Peng, Dezhi and Wang, Xinyu and Liu, Yuliang and Zhang, Jiaxin and Huang, Mingxin and Lai, Songxuan and Zhu, Shenggao and Li, Jing and Lin
没有合适的资源?快使用搜索试试~ 我知道了~
基于 PyTorch 和 mmdetection 的开源工具箱,专注于文本检测,文本识别以及相应的下游任务
共961个文件
py:676个
md:108个
yml:47个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 150 浏览量
2023-11-11
15:21:07
上传
评论
收藏 4.91MB ZIP 举报
温馨提示
基于 PyTorch 和 MMDetection 的开源工具箱,支持众多 OCR 相关的模型,涵盖了文本检测、文本识别以及关键信息提取等多个主要方向。它是 OpenMMLab 项目的一部分。它同时还支持了大多数流行的学术数据集,并提供了许多实用工具帮助用户评估模型的性能
资源推荐
资源详情
资源评论
收起资源包目录
基于 PyTorch 和 mmdetection 的开源工具箱,专注于文本检测,文本识别以及相应的下游任务 (961个子文件)
make.bat 761B
make.bat 761B
CITATION.cff 310B
setup.cfg 732B
covignore.cfg 570B
.codespellrc 125B
docutils.conf 43B
docutils.conf 43B
.coveragerc 31B
readthedocs.css 136B
readthedocs.css 136B
Dockerfile 2KB
Dockerfile 919B
Dockerfile 527B
.gitignore 2KB
MANIFEST.in 155B
1.jpeg 221KB
demo_kie.jpeg 121KB
2.jpeg 20KB
demo_densetext_det.jpg 618KB
demo_text_ocr.jpg 219KB
illustration.jpg 207KB
img_5.jpg 124KB
img_8.jpg 99KB
img_7.jpg 93KB
img_9.jpg 89KB
img_10.jpg 83KB
img_4.jpg 78KB
img_6.jpg 76KB
img_3.jpg 73KB
img_2.jpg 49KB
img_1.jpg 45KB
demo_text_recog.jpg 43KB
demo_text_det.jpg 37KB
kie.jpg 14KB
textrecog.jpg 14KB
textdet.jpg 13KB
1058891.jpg 5KB
1036169.jpg 4KB
1058892.jpg 3KB
1223731.jpg 2KB
1223729.jpg 2KB
1210236.jpg 2KB
1223732.jpg 1KB
1223733.jpg 1KB
1190237.jpg 1KB
1240078.jpg 1KB
broken.jpg 512B
table.js 981B
table.js 981B
collapsed.js 99B
collapsed.js 99B
instances_test.json 11KB
labels.json 2KB
old_label.jsonl 464B
LICENSE 11KB
Makefile 634B
Makefile 634B
changelog_v0.x.md 57KB
dataset_preparer.md 35KB
recog.md 33KB
changelog.md 33KB
config.md 31KB
config.md 29KB
det.md 29KB
inference.md 24KB
inference.md 23KB
transforms.md 21KB
transforms.md 21KB
structures.md 18KB
train_test.md 18KB
train_test.md 18KB
structures.md 17KB
recog.md 15KB
transforms.md 15KB
transforms.md 14KB
dataset_preparer.md 13KB
evaluation.md 12KB
README.md 12KB
useful_tools.md 12KB
evaluation.md 12KB
useful_tools.md 12KB
dataset.md 11KB
quick_run.md 11KB
quick_run.md 11KB
code.md 10KB
dataset.md 10KB
code.md 10KB
README.md 9KB
README_V2.md 8KB
install.md 8KB
install.md 8KB
visualization.md 8KB
det.md 8KB
README.md 8KB
visualization.md 7KB
dataset_prepare.md 7KB
dataset_prepare.md 7KB
README.md 6KB
README.md 6KB
共 961 条
- 1
- 2
- 3
- 4
- 5
- 6
- 10
资源评论
Java程序员-张凯
- 粉丝: 1w+
- 资源: 6727
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- python-leetcode面试题解之第198题打家劫舍-题解.zip
- python-leetcode面试题解之第191题位1的个数-题解.zip
- python-leetcode面试题解之第186题反转字符串中的单词II-题解.zip
- 一个基于python的web后端高性能开发框架,下载可用
- python-leetcode面试题解之第179题最大数-题解.zip
- python-leetcode面试题解之第170题两数之和III数据结构设计-题解.zip
- python-leetcode面试题解之第168题Excel表列名称-题解.zip
- python-leetcode面试题解之第167题两数之和II输入有序数组-题解.zip
- python-leetcode面试题解之第166题分数到小数-题解.zip
- python-leetcode面试题解之第165比较版本号-题解.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功