HMM-based Script Identification for OCR While current OCR systems are able to recognize text in an increasing number of scripts and languages, typically they still need to be told in advance what those scripts and languages are. We propose an approach that repurposes the same HMM-based system used for OCR to the task of script/language ID, by replacing character labels with script class labels. We apply it in a multi-pass overall OCR process which achieves “universal” OCR over 54 tested languages in 18 distinct scripts, over a wide variety of typefaces in each. For comparison we also consider a brute-force ap- proach, wherein a singe HMM-based OCR system is trained to recognize all considered scripts. Results are presented on a large and diverse evaluation set extracted from book im- ages, both for script identification accuracy and for overall OCR accuracy. On this evaluation data, the script ID sys- tem provided a script ID error rate of 1.73% for 18 distinct scripts. The end-to-end OCR system with the script ID sys- tem achieved a character error rate of 4.05%, an increase of 0.77% over the case where the languages are known a priori.
- 粉丝: 6
- 资源: 3
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 基于51单片机和HC-05蓝牙模块、Lcd模块、DS18B20温度传感器模块利用串口通信进行环境监测源码全部资料(高分项目)
- 基于51单片机和HC-05蓝牙模块、Lcd模块、DS18B20温度传感器模块利用串口通信进行环境监测(完整高分项目代码)
- 视频播放软件(Qt6项目)
- 详细的GMTSAR操作教程
- 山东大学计算机学院2023-2024第一学期可视化期末考试回忆版
- 数据导出java案例静态方法
- Springcloud物流配送后台69809(数据库+源码)
- Sqoop数据库数据导入导出教程PDF
- springboot个人博客平台程序源码70724
- SSM社区捐赠物资管理系统 程序源码70563