HMM-based Script Identification for OCR While current OCR systems are able to recognize text in an increasing number of scripts and languages, typically they still need to be told in advance what those scripts and languages are. We propose an approach that repurposes the same HMM-based system used for OCR to the task of script/language ID, by replacing character labels with script class labels. We apply it in a multi-pass overall OCR process which achieves “universal” OCR over 54 tested languages in 18 distinct scripts, over a wide variety of typefaces in each. For comparison we also consider a brute-force ap- proach, wherein a singe HMM-based OCR system is trained to recognize all considered scripts. Results are presented on a large and diverse evaluation set extracted from book im- ages, both for script identification accuracy and for overall OCR accuracy. On this evaluation data, the script ID sys- tem provided a script ID error rate of 1.73% for 18 distinct scripts. The end-to-end OCR system with the script ID sys- tem achieved a character error rate of 4.05%, an increase of 0.77% over the case where the languages are known a priori.
- 粉丝: 6
- 资源: 3
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 世界名企最完美的人才培训篇(AAAAA).doc
- 着眼长处的思维方法.doc
- 现代企业人力资源总监、职业培训师、职业经理人必看培训技巧大全.doc
- 学习资料-推荐:2006年企业年度培训方案实例(DOC_8).doc
- 最经典的培训案例.doc
- 中层主管的新型管理方式.doc
- 看世界名企怎样培养人才.docx
- 复旦大学张奇:2023年大规模语言模型中的多语言对齐与知识分区研究
- 非常好用的,U盘 启动盘制作 工作, 将U盘 分成 2个区,一个作为 启动盘,另外 一个正常存储文件,或iso
- 成功领导的六种思维方法.doc
- 成功的项目管理.doc
- 电话销售技巧.doc
- 岗位说明书的编写与应用.doc
- 非人力资源经理的人力资源管理.doc
- 高层经理人的八项修炼.doc
- 公司理财(MBA全景教程之六).doc