具有分层多特征学习的文本本地化资源-CSDN文库

168 浏览量 2021-02-24 19:21:34 上传评论收藏 506KB PDF 举报

文本本地化技术在自然场景图片中的应用研究是一个重要的方向，尤其在多媒体技术和数字成像设备普及的今天。自然场景图像中包含大量信息，而图像中的文本提供了关于图像内容的重要线索，无论是对人类还是计算机理解场景都极为重要。因此，对自然场景图像中的文本进行定位和识别显得尤为重要。在这篇研究论文《具有分层多特征学习的文本本地化》中，作者们提出了一个分层的文本定位框架，该框架从字符到字符串再到单词的定位逐步进行。不同于其它依赖于复杂的手工特征或重型学习模型的方法，作者们设计了简单但有效的特征和学习模型。研究引入了两层字符结构特征，与梯度直方图（Histogram of Gradient, HOG）和卷积神经网络（Convolutional Neural Network, CNN）特征结合起来用于字符定位。在字符串定位阶段，提出了一个九维字符串特征来对字符组合进行区分性验证。在最终的单词定位阶段，基于间隔代价学习最优的分割策略，将字符串分割成单词。通过在ICDAR基准数据集上进行的实验，证明了该方法的有效性和优越性。分层框架指的是将复杂的问题分解为多个层次，从低级到高级依次解决。在文本本地化场景中，这种分层方法有助于逐步提升定位的精确度，从单个字符定位出发，结合上下文信息逐步定位到整个字符串乃至完整单词。分层方法在处理复杂问题时能够使系统更加稳定和高效。字符结构特征是文本识别中一个重要的概念，它涉及到字符的形状、笔画等结构信息。研究中特别提到的两层字符结构特征，可能是指字符的内部结构和字符之间的相对位置关系。这些结构特征可以与HOG特征结合，HOG是一种被广泛使用的图像特征描述符，能够捕捉图像局部区域的梯度信息和边缘方向信息，对光照和几何变化具有一定的不变性。CNN特征则主要通过卷积神经网络提取的特征表示，这种特征通常能够捕捉到图像的抽象信息。在字符定位中，这些特征结合使用可以显著提高定位的准确性。字符串特征是将单个字符的特征进行组合，以描述一个字符串的特征。在文本识别中，字符串的特征比单个字符的特征更具有辨识度，因为它们包含了字符组合的语义信息。研究中提出的九维字符串特征可能是针对特定的字符串模式进行区分的，这些特征可能包括字符之间的相对位置、间距、排列顺序等。单词定位阶段的最优分割策略的学习，意味着需要根据字符串的某些统计特征来决定如何将字符串切割成单词。这种策略的学习可能涉及到机器学习算法，通过大量的训练数据来训练模型，使其能够自动学习到最佳的字符串分割方式。这篇论文提出了一种新颖的分层多特征学习方法来处理自然场景图片中的文本定位问题，该方法结合了字符结构特征、HOG特征和CNN特征，并通过学习最优的分割策略来提高定位精度。研究展示了该方法在实际应用场景中处理复杂场景文本的强大能力，对推动自然场景图像中文本识别技术的发展具有重要的意义。

资源推荐

资源详情

资源评论

adfa, p. 1, 2011.

Text Localization with Hierarchical

Multiple Feature Learning

Yanyun Qu

, Li Lin

, Weiming Liao

, Junran Liu

, Yang Wu

, Hanzi Wang

Computer Science Department, Xiamen University, Xiamen, China

{quyanyun, linlipj, liaoweimin0909, ilevanaliu, wang.hz}@gmail.com

Center for Frontier Science and Technology, Nara Institute of Science Technology, Nara, Japan

wuyang0321@gmail.com

Abstract. In this paper, we focus on English text localization in natural scene

images. We propose a hierarchical localization framework which goes from

characters to strings to words. Different from existing methods which either bet

on sophisticated hand-crafted features or rely on heavy learning models, our

approach tends to design simple but effective features and learning models. In

this study, we introduce a kind of two level character structure features in colla-

boration with the Histogram of Gradient (HOG) and the Convolutional Neural

Network (CNN) features for character localization. In string localization, a

nine-dimension string feature is proposed for discriminative verification after

grouping characters. For the final word localization, we learn an optimal split-

ting strategy based on the interval cues to split strings into words. Experiments

on the challenging ICDAR benchmark datasets demonstrate the effectiveness

and superiority of our approach.

Keywords: Hierarchical framework, character structure feature, string feature,

Convolutional Neural Network, text localization

1 Introduction

With the development of multimedia technology and the popularity of digital imaging

devices (such as digital cameras), vast amounts of natural scene images, which carry a

wealth of information, are collected and stored. Among all the information contained

in an image, text, as a kind of strong high-level semantic resource, provides valuable

cues about image content. Actually, text is very important for humans and computers

to understand the scenes. Judd[1]proved that people, given an image, tend to fixate on

text more than other objects, which suggests the importance of text to humans. Text

recognition is also critical in intelligent navigation, movie summation, vision assis-

tance systems, etc. As a result, there is an urgent need to develop the technology of

text recognition in natural scene images.

Text recognition is usually divided into two tasks: text localization and word recogni-

tion. Text localization is an important prerequisite for word recognition. Text localiza-

tion, as an important task among the renowned competitions held in the International

Conference on Document Analysis and Recognition (ICDAR), remains challenging

because of various scenes, backgrounds and text appearances. There are two kinds of

methods for text localization: heuristics-based and learning-based methods.

Heuristics-based methods. The heuristics-based methods are based on heuristics,

such as connected component analysis (CCA).Lucas et al. [2, 3] used the Maximally

Stable Extremal Regions (MSER) algorithm to detect candidate text and the special

text component features for text localization. In [4], they presented a novel approach

based on Oriented Stroke Detection to improve the poor performance on noisy im-

ages. Another widely used methods are Stroke Width Transform [2]and its extensions

such as those in[3].Epstein et al.[2]proposed a novel stroke filter to improve the de-

tection of text candidate regions. Huang et al. [3] introduced a low-level filter called

the Stroke Feature Transform (SFT) by incorporating color cues of text pixels.

Learning-based methods. Learning-based methods use the same technology as

other object recognition methods. Those methods can be roughly classified into two

classes, supervised learning and unsupervised learning. Popular supervised learning

algorithms, such as Support Vector Machine (SVM) and AdaBoost, were used for text

localization in [4-6]. Unsupervised classification methods were explored for text and

non-text classification in [7].Wang et al. [8, 9] proposed to use Convolutional Neural

Network (CNN) to learn the unsupervised features for text recognition.

Among the state-of-the-art methods of text localization, few have focused on cha-

racter segmentation, because character segmentation is considered as challenging as

text localization. In this paper, we present a hierarchical approach for text localization

which goes from characters to strings, and to words, in a semantically bottom-up way.

Different from existing methods which either bet on a few hand-crafted features [4,

10, 11]or rely on heavy learning models [12], our approach presents a systematic way

to integrate various effective features extracted or learned at different semantic levels.

We adopt simple learning models, such as kernel SVMs and CNN, and focus more on

designing simple yet effective new features. The framework of our approach is shown

in Fig. 1. And some visual results are given in Fig. 2.

Character localization. For character localization, we explore three types of sup-

plementary features: structure, gradient based (HOG), and CNN-based features.

String localization. For string localization, at first, we group characters by their

structure features because the characters in a word often have similar structure fea-

ture, such as color, aspect ratio, alignment, etc. Then we design a nine-dimension

string feature to learn a SVM model that distinguishes non-text strings efficiently.

Word localization. In word localization, we use the interval cues of adjacent cha-

racters in a word to learn the best strategy to split the candidate strings into words.

Our contributions are as follows: 1) A hierarchical text localization framework

which goes from characters to strings and to words is proposed; 2) A group of struc-

ture feature combined with HOG and CNN features are implemented for character

localization; 3) The structure and string features are designed for string localization.

The rest of the paper is organized as follows: Character localization is introduced

in Section 2. String localization and word localization are proposed in Section 3 and

Section 4, respectively. Experimental results are described in Section 5. Conclusions

are given in Section 6.

剩余9页未读，继续阅读

评论收藏

内容反馈

weixin_38723699

粉丝: 6
资源: 871

具有分层多特征学习的文本本地化

具有嵌入式文本分割的有效的图割场景文本本地化

matlab交叉检验代码-CMDN_IJCAI2016:IJCAI2016论文“通过具有多个深度网络的分层学习进行跨媒体共享表示”的源代码

模式识别作业,包括线性分类器；最小风险贝叶斯分类器；监督学习法分层聚类分析；K－L变换提取有效特征,支持向量机

Delphi DevExpress控件 TcxGrid中文本地化（附用法）

深度学习算法教程(Deeplearning Algorithms Tutorial) 完整版PDF

联邦学习助力异构数据联合解密.pptx

毕设：基于PyQT+朴素贝叶斯的文本分类算法的文本分类系统，该系统具有qt桌面端和web端.zip

一个简单的基于C#实现的WPF localization 程序例子代码

learning1M：此存储库包含我们的论文“从1M图像中学习基于检索的本地化的条件不变特征”和“几何可映射的图像特征”的代码

分层视图预测器通过无序视图之间的分层预测进行无监督三维全局特征学习_Hierarchical View Predictor Un

基于分层稀疏表示特征学习的高光谱图像分类研究

RSA 加密程序 Version 1.5（支持文件、文本加密解密, 支持多种语言）

4-3+58同镇下沉市场中的推荐技术实践.pdf

Cocos2d-x学习笔记——完全掌握C++ API与游戏项目开发.zip

new translate.zip

21春北京理工大学《现代广告学》在线作业参考答案.docx

具有组稀疏特征的人脸对齐的分层上下文模型

用于图像分类的两级分层特征学习

具有动态学习能力的分层进化粒子群优化算法.pdf

jQuery.i18n.properties

javaI18N[国际化]

Laravel开发-locale

基于Java的网络课堂的设计与实现【附源码】

struts2 国际化

最新资源