没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
【综述】文本摘要 Survey on Abstractive Text Summarization using various approaches Arun Krishna Chitturi International Journal of Advanced Trends in Computer Science and Engineering
资源推荐
资源详情
资源评论
Arun Krishna Chitturi et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(6), November - December 2019, 2956- 2964
2956
ABSTRACT
Text summarization is the core aspects of Natural Language
processing. Summarized text should consist of unique
sentences. It is used in many situations in today’s Information
technological word, one of the best examples is in
understanding customer feedbacks in companies. This job can
be done by humans, but if the text or data that has to be
summarized then it will consume lot of time and work force.
This situation lead to birth of different approaches in
summarization. This paper addresses and concentrates on
various methods and approaches and their results in
abstractive text summarization. This survey gives an insight
about different types of text summarization and various
methods used in abstractive text summarization in recent
developments.
Key words : abstractive summarization, decoder, encoder,
multi document summarization
1. INTRODUCTION
Summarization is very well useful to us in today’s world.
The main aim of abstractive text summarization is to produce
shortened version of input text with relevant meaning[7]. The
adjective abstractive is utilized because it denotes that the
generated summary is not a combination or selection of some
repeated sentences, but it a paraphrasing of core contents of
the input document [8]. Abstractive summarization is a very
difficult problem apart from Machine translation. The main
challenge in ATS is to compress the matter of input document
in an optimized way so that the main concepts of the
document are not missed [8]. In current technologically
advancing world, volumes of data is increasing and it is very
difficult to read the required data in short time[6]. It is a pretty
task to collect the required information and then convert into
summarized form. Therefore, text summarization came into
demand. Summarized text saves time and helps in avoiding
retrieving massive text. Abstractive Text summarization can
be combined with numerous intelligent systems on the basis
of NLP technologies like information retrieval, question
answering, and text classification to find the particular
information [9]. If latent structure information of the
summaries can be incorporated into abstractive
summarization model, then the quality of summaries
generated can be improved [10]. In some research works,
topic models are used to capture the latent information from
the input paragraph or documents. Despite having many
hurdles abstractive text summarization faces core issues like
(i) Neural sequence-to-sequence models which try to produce
generic summary, which include mostly used phrases (ii) The
generated summaries are less readable and are not
grammatically perfect [11]. Summarization is divided into
following types: (a) Extractive text summarization (b)
Abstractive text summarization [6]. Extractive
summarization extracts the frequently used or only precise
phrases without modifying them and generates the summary.
Whereas abstractive summarization generates new sentences
and also optimally decreases the length of the document.
Abstractive is better and qualitative than extractive as it takes
data from multiple documents and then generate precise
information of summary. Abstractive summarization is again
achieved in two ways. They are: (a) Structure based approach
(b) Semantic based approach. Neural network models on the
basis of encoder decoder for machine translation achieved
good ROGUE scores [12]. Abstractive approaches generate
summary similar to summary generated by humans but they
are more expensive [13]. On the basis of current state of RNN
in Attentive RNN the encoder computes score over the input
sentences [14]. The main problem in ATS are (a) Long
document summarization (b) Abstractive metric (c)
Controlling output length. F1 scores are evaluated generally
using ROUGE metrics [15]. Recall-Oriented Understudy for
Gisting Evaluation (ROUGE) metric was proposed by (Lin,
2004) [24]. Named Entity Recognition is also one of the core
application in NLP which helps in removing ambiguity [28].
Information Retrieval is also highly difficult and it requires
quality documents[37].
2. SURVEY
2.1 Semantic Link Network For Summarization[1]:
SLN is a semantics self-formulated for semantically
organizing resources to support advanced information
services like Abstractive Text Summarization [1]. According
the author the semantic link network, which is used in
Abstractive text summarization, has following important
components:
Survey on Abstractive Text Summarization using various approaches
Arun Krishna Chitturi
1
, Saravanakumar Kandaswamy
2
1
Vellore Institute of Technology, Vellore, India, chitturiarunkrishna@gmail.com
2
Professor at Vellore Institute of Technology, Vellore, India, ksaravanakumar@vit.ac.in
ISSN 2278-
3091
Volume 8, No.6, November – December 2019
International Journal of Advanced Trends in Computer Science and Engineering
Available Online at http://www.warse.org/IJATCSE/static/pdf/file/ijatcse45862019.pdf
https://doi.org/10.30534/ijatcse/2019/45862019
Arun Krishna Chitturi et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(6), November - December 2019, 2956- 2964
2957
a) SLN construction
(i) Concept extraction with relation identification
(ii) Event extraction with relation identification
Event trigger extraction
Event argument extraction
Event relation extraction
b) Semantic link network summarization
a) Semantic link network construction
SLN construction involves the following two important
components
(i) Concept extraction with relation identification
Events, concepts are considered as two main roots or base
units of information present in the documents. The relation
present between concepts carries critical information between
events. The main advantage of using concepts and their
relations in Semantic link network (SLN) is that few events
with indirect relations can be connected easily.
Here the relation between the concepts are nothing but the
phrases between the concepts. The event triggers are the one’s
which are verbs. Some valid syntactic patterns are used to
differentiate between the event triggers i.e. the verbs. Some
illustrative syntactic patterns which ae used are “be”, “be”-
Noun Phrase - Preposition, “be”- Adjective - Preposition.
(ii) Event extraction with relation identification
Event extraction with relation identification is also called
as Frame net based event extraction. Some pre-defined event
schemes are present, and with the help of these schemes
structured event information is extracted. Many present
approaches rely on Automatic Constant Extraction (ACE),
which characterizes eight types of events with thirty-three
Subtypes in total. Event is a mixture or combination of
Automatic Constant Extraction (ACE), which characterizes
eight types of events with thirty-three Subtypes in total. Frame
net corpus consists of entire interpretations of Semantic
frames along with the relation between the semantic frames.
Each formal statement is considered as a frame. Many frames
together with a hierarchy complete the event schemes. Event
extraction consists of following components
Event trigger extraction
In this step all event triggers are identified and then the event
types are also classified. In a given sentence. A log-linear
model is utilized in event type classification. In a particular
sentence, X = {x1,x2,…xn} with the triggers T = {t1,t2,..tn},
ti denotes ith trigger word, til denotes Lemma.
P(f | ti, X) = [exp(ѲT g(f,ti,x))] / [Σf ∈ Fi exp(ѲT g(f,ti,x))]
Event argument extraction
In this step, concepts which act as arguments are identified
and their argument roles are classified. This constitutes for
event trigger extraction.
Event relation extraction
The structure of the sentences and their features are weighed
to ascertain the relation between the events. Some of the
common categories of semantic links between the events are
Condition link, Sequential link, Attribution link.
b) Semantic link network summarization
The summary that is extracted should contain most important
events and concepts. It should also be semantically coherent.
For achieving this , we increase the saliency scores of selected
concepts and events. Let us consider E and C which will be
denoting all types of event nodes and concept nodes. Here E is
unique concept and C is saliency.
Σe ∈ E ѲT f(e) + Σc ∈ C ΨT g(c)
f(e) - Features of Event e
g(c) – Features of Concept c
2.2 Improved Semantic Graph approach For
Summarization [2]:
We know that two approaches are there for summarization.
They are :
(i) Extractive Text Summarization
(ii) Abstractive Text summarization
For achieving abstractive text summarization, there are 2
approaches, they are
(i) Linguistic approach
(ii) Semantic approach.
Usually the graph based approach requires human
intervention and it is also specified or constrained to one
domain. It can’t be used for other domains. Naïve Bayes is
supervised algorithm and is known for it’s robustness [48].
But then it also requires human intervention.
The author proposes a semantic graph based method for Multi
document abstractive summarization(MDAS). The proposed
graph based ranking algorithm is improvised by using
Predictive Argument Structure (PAS) semantic similarity and
2 types of semantic relationships. Integrating the semantic
similarity will be helpful in determining the relation between
PAS and is also helpful in detecting redundancy. This
approach has the following main components.
(i) Creation of Semantic graph
a. Semantic role labelling
This is the first step and in this stage, each sentence is
parsed and Predictive Argument Structure(PAS) is extracted
from them. Multiple documents are segmented into bunch of
sentences. Now every sentence is given with a key, which is
based on location and time of the sentence. A SENNA which
is a semantic role parser is utilized to perform semantic text
analysis in Abstractive Text Summarization. It also decides
PAS from sentence by labelling semantic phrases or also
called semantic arguments. These semantic arguments are
classified as (a) Core arguments (b) Adjunctive arguments.
b. Semantic similarity matrix
Now in this stage, semantic similarity scores of PAS are
calculated in pairs. Based on these semantic similarity scores
a matrix is built. Verb, Location, Noun and Time arguments
of each PAS are differentiated or related with other PAS to
find out pair wise similarities.
First Jiang’s measure finds semantic distance of the
concepts.
剩余8页未读,继续阅读
资源评论
Fun_He
- 粉丝: 19
- 资源: 104
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功