pytorch、TensorFlow、深度学习实现基于EAST自然文本检测系统.zip资源-CSDN文库

共14个文件

py：9个

sh：1个

gitattributes：1个

版权申诉

人工智能

深度学习

56 浏览量 2024-03-28 20:01:49 上传评论收藏 1.99MB ZIP 举报

：基于EAST的自然文本检测系统在AI领域，深度学习已经成为了处理复杂问题的核心技术之一，尤其在图像识别和自然语言处理方面。本项目“pytorch、TensorFlow、深度学习实现基于EAST自然文本检测系统”是人工智能毕业设计或课程设计的一个实例，它着重展示了如何利用深度学习框架PyTorch和TensorFlow来实现文本检测任务。 EAST（Efficient and Accurate Scene Text Detector）是一种端到端的文本检测模型，它在2017年由百度提出，主要优点在于其高效性和准确性。EAST模型采用全卷积网络（FCN）结构，能够直接预测出文本的几何形状，如旋转框和文本行的宽度、高度等信息。这个模型对于自然场景文本检测具有很高的适应性，可以在各种复杂的背景和文本排列下工作。在PyTorch中，我们可以构建EAST模型的神经网络架构，包括多层卷积层、残差块以及上采样层。PyTorch提供了灵活的动态计算图机制，便于模型的调试和优化。而TensorFlow，作为另一个广泛使用的深度学习框架，也有其独特的优点，比如静态计算图和高效的分布式训练能力。在TensorFlow中实现EAST，我们可以利用Keras API进行模型构建，这简化了代码编写并提高了开发效率。在实际项目中，首先需要准备数据集，通常包括带有标注的图像，以便训练模型。数据预处理步骤包括图像增强（例如翻转、缩放）、标签编码以及分批次加载。接下来，定义EAST模型结构，配置损失函数（如Focal Loss或IOU Loss）和优化器（如Adam或SGD），然后进行模型训练。在训练过程中，要定期评估模型性能，通过可视化工具（如TensorBoard）监控训练进度和指标。训练完成后，模型可以应用于测试集，生成预测结果。EAST模型的输出通常为每个文本区域的边界框和对应的置信度，这些信息可以进一步用于文本识别。值得注意的是，为了提高检测效果，可以结合其他技术，如CRNN（Connectionist Recurrent Neural Network）进行字符识别，或者利用CTC（Connectionist Temporal Classification）解决序列标注问题。此外，对于实际部署，模型还需要进行量化和优化，以便在资源受限的设备上运行，如移动设备或嵌入式系统。可以使用如TensorRT、ONNX等工具进行模型转换和优化，以达到更高的推理速度和更低的内存占用。这个项目涵盖了深度学习基础、自然文本检测技术、两种主流深度学习框架的应用，以及模型训练、优化和部署等多个方面，对深入理解人工智能和深度学习有着重要的实践意义。通过这样的项目，学生不仅可以掌握理论知识，还能提升实际操作能力和问题解决能力。

资源推荐

资源详情

资源评论

收起资源包目录

pytorch、TensorFlow、深度学习实现基于EAST自然文本检测系统.zip （14个子文件）

ignore4134

eval.py 12KB

loss.py 2KB

hmean.py 1KB

.gitattributes 155B

main.py 5KB

LICENSE 1KB

model.py 8KB

locality_aware_nms.py 2KB

data_util.py 4KB

run.sh 41B

EAST-master (1).zip 1.97MB

README.md 4KB

data_utils.py 39KB

config.py 377B

# EAST: An Efficient and Accurate Scene Text Detector ### Description: This version will be updated soon, please pay attention to this work. The motivation of this version is to build a easy-training model. This version can automatically update best_model by comparing current hmean and the former. At the same time, we can see evaluation info about every sample easily. + 1.train + 2.predict + 3.compress + 4.compute Hmean(if Hmean is higher than before, update best_weight.pkl) + 5.visualization(blue, green, red) + 6.multi-scale test (update soon) multi-scale vis. (vis with score, scales) ### Thanks The version is ported from [argman/EAST](https://github.com/argman/EAST), from Tensorflow to Pytorch ### Check On Website If you have no confidence of the result of our program, you could use submit.zip to submit on [website](http://rrc.cvc.uab.es/?ch=2&com=mymethods&task=1),then you can see result of every image. ### Performance + right -- green || wrong -- red || miss -- blue ![visualization](https://github.com/songdejia/east-pytorch/blob/master/screenshots/vis01.png) ![visualization](https://github.com/songdejia/east-pytorch/blob/master/screenshots/vis02.png) + recall/precision/hmean for every test image ![hmean](https://github.com/songdejia/east-pytorch/blob/master/screenshots/hmean.png) ### Introduction This is a pytorch re-implementation of [EAST: An Efficient and Accurate Scene Text Detector](https://arxiv.org/abs/1704.03155v2). The features are summarized blow: + Only **RBOX** part is implemented. + A fast Locality-Aware NMS in C++ provided by the paper's author.(g++/gcc version 6.0 + will be ok) + Evalution see [here](http://rrc.cvc.uab.es/?ch=4&com=evaluation&view=method_samples&task=1&m=29855&gtv=1) for the detailed results. + Differences from original paper + Use ResNet-50 rather than PVANET + Use dice loss (optimize IoU of segmentation) rather than balanced cross entropy + Use linear learning rate decay rather than staged learning rate decay Thanks for the author's ([@zxytim](https://github.com/zxytim)) help! Please cite his [paper](https://arxiv.org/abs/1704.03155v2) if you find this useful. ### Contents 1. [Installation](#installation) 2. [Download](#download) 3. [Prepare dataset/pretrain](#dataset) 4. [Test](#train) 5. [Train](#test) 6. [Examples](#examples) ### Installation 1. Any version of pytorch version > 0.4.0 should be ok. ### Download 1. Pretrained model is not provided temporarily. Web site is updating now, please continue to pay attention ### Prepare dataset/pretrain weight [1]. dataset(you need to prepare for dataset for train and test) suggestions: you could do a soft-link to root_to_this_program/dataset/train/img/*.jpg + -- train ./dataset/train/img/img_###.jpg ./dataset/train/gt/img_###.txt (you need to change name) + -- test ./data/test/img_###.jpg (img only) + -- gt.zip ./result/gt.zip(ICDAR15 gt.zip is avaliable on [website](http://rrc.cvc.uab.es/?ch=2&com=mymethods&task=1) ** Note: you can download dataset here + -- [ICDAR15](http://rrc.cvc.uab.es/?ch=4&com=downloads) + -- [ICDAR13](http://rrc.cvc.uab.es/?ch=2&com=downloads) [2]. pretrained + In config.py set resume True and set checkpoint path/to/weight/file + I will provide pretrianed weight soon [3]. check GPUs and CPUs you can use following to check aviliable gpu, this is for train ``` watch -n 0.1 nvidia-smi ``` then, you will see 2,3 is avaliable, modify config.py gpu_ids = [0,1], gpu = 2, and modify run.sh - CUDA_VISIBLE_DEVICES=2,3 ### Train If you want to train the model, you should provide the dataset path in config.py and run ``` sh run.py ``` ** Note: you should modify run.sh to specify your gpu id If you have more than one gpu, you can pass gpu ids to gpu_list(like gpu_list=0,1,2,3) in config.py ** Note: you should change the gt text file of icdar2015's filename to img_\*.txt instead of gt_img_\*.txt(or you can change the code in icdar.py), and some extra characters should be removed from the file. See the examples in training_samples/** ### Test By default, we set train-eval process into integer. If you want to use eval independently, you can do it by yourself. Any question can contact me. ### Examples Here are some test examples on icdar2015, enjoy the beautiful text boxes! ![image_1](demo_images/img_2.jpg) ![image_2](demo_images/img_10.jpg) ![image_3](demo_images/img_14.jpg) ![image_4](demo_images/img_26.jpg) ![image_5](demo_images/img_75.jpg)

评论收藏

内容反馈

版权申诉