# OCR_TF_CRNN_CTC
This software implements the Convolutional Recurrent Neural Network (CRNN), a combination of CNN, RNN and CTC loss for image-based sequence recognition tasks, such as scene text recognition and OCR.
"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" : https://arxiv.org/abs/1507.05717
More details for CRNN and CTC loss (in chinese): https://zhuanlan.zhihu.com/p/43534801
# Dependencies
All dependencies should be installed are as follow:
* tensorflow==1.8.0
* opencv-python
* numpy
Required packages can be installed with
```bash
pip install -r requirements.txt
```
Note: This software cannot run in the tensorflow lastest version r1.11.0 since it's modified the tf.contrib.rnn API.
# Run demo
Asume your current work directory is OCR_TF_CRNN_CTC:
```bash
cd path/to/your/OCR_TF_CRNN_CTC/
```
Dowload pretrained model and extract it to your disc: [GoogleDrive](https://drive.google.com/file/d/1A3V7o3SKSiL3IHcTqc1jP4w58DuC8F9o/view?usp=sharing) .
Export current work directory path into PYTHONPATH:
```bash
export PYTHONPATH=$PYTHONPATH:./
```
Run inference demo:
```bash
python tools/inference_crnn_ctc.py \
--image_dir ./test_data/images/ --image_list ./test_data/image_list.txt \
--model_dir /path/to/your/bs_synth90k_model/
```
Result is:
```
Predict 1_AFTERSHAVE_1509.jpg image as: aftershave
```
![1_AFTERSHAVE_1509.jpg](https://github.com/bai-shang/CRNN_CTC_Tensorflow/blob/master/test_data/images/1_AFTERSHAVE_1509.jpg?raw=true)
```
Predict 2_LARIAT_43420.jpg image as: lariat
```
![2_LARIAT_43420](https://github.com/bai-shang/CRNN_CTC_Tensorflow/blob/master/test_data/images/2_LARIAT_43420.jpg?raw=true)
# Train a new model
### Data Preparation
* Firstly you need to download [Synth90k](http://www.robots.ox.ac.uk/~vgg/data/text/) dataset and extract it into a folder.
* Secondly supply a txt file to specify the relative path to the image data dir and it's corresponding text label.
For example: image_list.txt
```bash
90kDICT32px/1/2/373_coley_14845.jpg coley
90kDICT32px/17/5/176_Nevadans_51437.jpg nevadans
```
* Then you are suppose to convert your dataset into tensorflow records which can be done by
```bash
python tools/create_crnn_ctc_tfrecord.py \
--image_dir path/to/90kDICT32px/ --anno_file path/to/image_list.txt --data_dir ./tfrecords/ \
--validation_split_fraction 0.1
```
Note: make sure that images can be read from the path you specificed, such as:
```bash
path/to/90kDICT32px/1/2/373_coley_14845.jpg
path/to/90kDICT32px/17/5/176_Nevadans_51437.jpg
.......
```
All training image will be scaled into height 32 and write to tfrecord file.
The dataset will be divided into train and validation set and you can change the parameter to control the ratio of them.
#### Otherwise you can use the dowload_synth90k_and_create_tfrecord.sh script automatically create tfrecord:
```
cd ./data
sh dowload_synth90k_and_create_tfrecord.sh
```
### Train model
```bash
python tools/train_crnn_ctc.py --data_dir ./tfrecords/ --model_dir ./model/ --batch_size 32
```
After several times of iteration you can check the output in terminal as follow:
![](https://github.com/bai-shang/CRNN_CTC_Tensorflow/blob/master/data/20180919022202.png?raw=true)
During my experiment the loss drops as follow:
![](https://github.com/bai-shang/CRNN_CTC_Tensorflow/blob/master/data/20180919202432.png?raw=true)
### Evaluate model
```bash
python tools/eval_crnn_ctc.py --data_dir ./tfrecords/ --model_dir ./model/
```
为了将特征输入到Recurrent Layers,做如下处理:
首先会将图像缩放到 32*W*3 大小
然后经过CNN后变为 1* (W/4)*512
接着针对LSTM,设置 T=(W/4) , D=512 ,即可将特征输入LSTM。
所以在处理输入图像的时候,建议在保持长宽比的情况下将高缩放到 32,这样能够尽量不破坏图像中的文本细节。当然也,也可以将输入图像缩放到固定宽度,但是这样肯定会造成性能下降。
字符转义,"\"" : vlaue.
warning: tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found.
It turns out that the ctc_loss requires that the label lengths be shorter than the input lengths.
If the label lengths are too long, the loss calculator cannot unroll completely and therefore cannot compute the los.
输入的序列长度必须 >= label 的长度,否则无法计算 CTC loss,换句话说,识别出的字符长度可以少于输入的序列长度但是 不能比它长。
需要在char_map/char_mao.json 中添加英文或中文符号,添加格式"&" : 56
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
OCR_TF_CRNN_CTC.zip (53个子文件)
OCR_TF_CRNN_CTC
.git
info
exclude 240B
objects
pack
pack-c748914bdd364016b638e7c84f4f072baef1d8b8.idx 14KB
pack-c748914bdd364016b638e7c84f4f072baef1d8b8.pack 833KB
info
HEAD 23B
description 73B
packed-refs 46B
config 327B
index 2KB
refs
tags
remotes
origin
master 41B
heads
master 41B
COMMIT_EDITMSG 18B
hooks
commit-msg.sample 896B
pre-receive.sample 544B
fsmonitor-watchman.sample 3KB
pre-rebase.sample 5KB
prepare-commit-msg.sample 1KB
update.sample 4KB
pre-push.sample 1KB
pre-commit.sample 2KB
post-update.sample 189B
applypatch-msg.sample 478B
pre-applypatch.sample 424B
logs
HEAD 217B
refs
remotes
origin
master 169B
heads
master 217B
requirements.txt 61B
data
20180919202432.png 88KB
20180919022202.png 98KB
20180919202451.png 100KB
dowload_synth90k_and_create_tfrecord.sh 386B
create_synth90k_tfrecord.py 4KB
char_map
char_map1.json 780B
char_map.json 425B
crnn_model
__init__.py 1B
model.py 4KB
.DS_Store 8KB
tools
create_crnn_ctc_tfrecord.py 4KB
train_crnn_ctc.py 10KB
eval_crnn_ctc.py 8KB
inference_crnn_ctc.py 5KB
README.md 5KB
test_data
images
1_AFTERSHAVE_1509.jpg 2KB
3_REINFECTION_64188.jpg 2KB
5_Rousted_66822.jpg 1KB
2_LARIAT_43420.jpg 949B
9_HORSETRADING_36909.jpg 3KB
4_CONJUGATION_16114.jpg 3KB
8_Shortages_70419.jpg 2KB
6_Tangibility_77430.jpg 1KB
7_Commercializing_15217.jpg 1KB
.DS_Store 6KB
image_list.txt 208B
labelTest.py 213B
共 53 条
- 1
资源评论
- 小二焦2021-06-09大佬,数据集给个网盘连接呗
- cek20112020-10-13下不下来 500.html
nobrody
- 粉丝: 302
- 资源: 9
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 常用工具集参考用于图像等数据处理
- 音乐展示网页、基于Stenography的图像数字水印添加与提取,以及基于颜色矩和Tamura算法的图像相似度评估算法py源码
- 基于EmguCV(OpenCV .net封装),图像数字水印加解密算法的实现,其中包含最低有效位算法,离散傅里叶变换算法+文档书
- 基于matlab+DWT的图像水印项目,数字水印+源代码+文档说明+图片+报告pdf
- (优秀毕业设计)基于python实现的数字图像可视化水印系统的设计与实现,多种数字算法实现+源代码+文档说明+理论演示pdf
- 基于DWT-DCT-SVD和deflate压缩的数字水印方法python源码+Gui界面+演示视频(高分毕业设计)
- 基于matlab实现DWT、DCT、SVD算法数字图像水印可视化系统+GUI界面+文档说明+详细注释(高分毕业设计)
- NCIAE-Data-Structure大一大二笔记
- 学习wireshark笔记
- digital-image-数据可视化笔记
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功