在keras-tensorflow中实现了用于场景文本检测的YOLO算法（未使用对象检测API）可以调整代码以使用YOLO进行不同的对象检测任务的训练.zip资源-CSDN文库

共20个文件

jpg：12个

txt：2个

py：2个

版权申诉

Python

深度学习

TensorFlow

104 浏览量 2024-11-26 22:01:38 上传评论收藏 2.12MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

在 keras-tensorflow 中实现了用于场景文本检测的 YOLO 算法（未使用对象检测 API）。可以调整代码以使用 YOLO 进行不同的对象检测任务的训练。.zip （20个子文件）

标签.txt 57B

Preprocess.py 2KB

LICENSE 1KB

Results1

113.jpg 63KB

28.jpg 127KB

30.jpg 64KB

45.jpg 79KB

57.jpg 100KB

108.jpg 76KB

114.jpg 83KB

0.jpg 80KB

93.jpg 78KB

109.jpg 115KB

9.jpg 77KB

资源内容.txt 1KB

Utils.py 2KB

Test

img_88.jpg 566KB

model

text_detect_model.json 79KB

README.md 3KB

Yolo.ipynb 918KB

# Text-Detection-using-Yolo-Algorithm-in-keras-tensorflow First step towards building an efficient OCR system is to find out the specific text locations. Implemented the YOLO ( You Only Look Once ) algorithm from scratch (no object detection API used) for the specific task of Scene Text Detection in python using keras and tensorflow. ## Data : The dataset used is ICDAR competetion dataset available here : [Drive Link](https://drive.google.com/open?id=1ObrV9pbH_-LBGbIodWgB6W4dtQloTTH6) Train images = 376 Validation images = 115 ## Preprocessing : The `Preprocess.py` file handles all the necessary preprocessing and saves the data in the form of numpy arrays. First, the images are resized to (512,512) dimensions. Accordingly, the ground truth of the boxes is modified as well. All the images are normalized to a range of [-1 , 1]. The ground truth coordinates are processed to form a matrix of dimensions as ( grid height , grid width , 1 , 5 ). ### Custom data : Necessary changes need to be done in the `Preprocess.py` file to input the custom data and images, to create the appropriate input and output matrices. ## Model : MobileNetv2 architecture is used as a feture extractor. The main reason for choosing MobileNetv2 is the high accuracy and the less number of weights. The fully connected layers of MobileNet are removed. Three Conv layers are added to the last layer of the MobileNet architecture to output a shape of (grid height , grid width , 1 , 5 ). The model weights can be found here : [Drive Link](https://drive.google.com/open?id=1OwrEu6SeaNM3l_clLN9F40W-tMpRfz97) ## Loss Function and Training: The loss function implemented is the one specified in the YOLO paper. As there is only one class to be predicted, the contribution of class predictions to the loss is eliminated. The model is trained for 180 epochs in total with a batch size of 4. The learning rate was kept at 0.001 for the first 100 epochs and lowered to 0.0001 for the next 80 epochs. ## Inference : The `Utils.py` consists of the functions used to convert the matrix output ( grid height , grid width , 1 , 5 ) of the model to actual predicted bounding boxes. Non max suppression is used to eliminate boxes on the same object as stated in the YOLO paper. ## Results : ![alt text](https://github.com/Neerajj9/Text-Detection-using-Yolo-Algorithm-in-keras-tensorflow/blob/master/Results1/28.jpg) ![alt text](https://github.com/Neerajj9/Text-Detection-using-Yolo-Algorithm-in-keras-tensorflow/blob/master/Results1/113.jpg) ![alt text](https://github.com/Neerajj9/Text-Detection-using-Yolo-Algorithm-in-keras-tensorflow/blob/master/Results1/114.jpg) ## Requirements : 1. Keras : 2.2.2 2. Tensorflow : 1.9.0 3. OpenCV : 3.4.1 4. Numpy : 1.14.3 5. Matplotlib : 2.2.2

评论收藏

内容反馈

版权申诉