没有合适的资源?快使用搜索试试~ 我知道了~
从文档评估和生成器中提取关键信息_Key Information Extraction From Documents Evalu
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 4 浏览量
2022-01-21
00:41:37
上传
评论
收藏 192KB PDF 举报
温馨提示
试读
7页
从文档评估和生成器中提取关键信息_Key Information Extraction From Documents Evaluation And Generator.pdf
资源推荐
资源详情
资源评论
arXiv:2106.14624v1 [cs.CL] 9 Jun 2021
Key Information Extraction From Documents:
Evaluation And Generator
⋆
Oliver Bensch
1
, Mirela Popa
2
, and Constantin Spille
3
1
Maastricht University, 6200 MD Maastricht, The Netherlands
2
Maastricht University, 6200 MD Maastricht, The Netherlands
3
KI Group GmbH
o.bensch@student.maastrichtuniversity.nl
mirela.popa@maastrichtuniversity.nl
c.spille@kigroup.de
Abstract. Extracting information from documents usually relies on nat-
ural language processing methods working on one-dimensional sequences
of text. In some cases, for example, for the extraction of key informa-
tion from semi-structu red documents, such as invoice-documents, spatial
and formatting information of text are crucial to understand t he con-
textual meaning. Convolutional neural networks are already common in
computer vision models to process and extract relationships in multi-
dimensional data. Therefore, natural language processing models have
already been combined with computer vision models in the past, t o ben-
efit from e.g. positional information and to improve performance of these
key information extraction models. Existing models were either trained
on unpublished data sets or on an annotated collection of receipts, which
did not focus on PDF-like do cuments. Hence, in this research project a
template-based document generator was created to compare state-of-the-
art models for information extraction. An existing information extrac-
tion model “Chargrid” (Katti et al., 2019) was reconstructed and t he
impact of a bounding box regression decoder, as well as the impact of an
NLP pre-processing step was evaluated for information extraction from
documents. The results have shown that NLP based pre-processing is
beneficial for model performance. However, the use of a bounding box
regression decoder increases the model performance only for fields that
do not follow a rectangular shape.
1 Introduction
Natural language processing (NLP) methods are widely used on one-dimensional
sequences of text. In some ca ses, for e xample, in the extra ction of key info rmation
of invoice documents, spatial information s uch as the position of text are crucial
to understand the contextual meaning.
Convolutional neural networks (CNNs) are already common in computer vi-
sion (CV) models to process and extract relationships in multidimensional data.
⋆
Supported by organization KI Group GmbH.
资源评论
易小侠
- 粉丝: 6469
- 资源: 9万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功