【综述】文本摘要.pdf_文本分类综述资源-CSDN文库

版权申诉

文本摘要

机器学习

13 浏览量 2021-04-03 13:27:30 上传评论收藏 225KB PDF 举报

资源推荐

资源详情

资源评论

Arun Krishna Chitturi et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(6), November - December 2019, 2956- 2964

2956



ABSTRACT

Text summarization is the core aspects of Natural Language

processing. Summarized text should consist of unique

sentences. It is used in many situations in today’s Information

technological word, one of the best examples is in

understanding customer feedbacks in companies. This job can

be done by humans, but if the text or data that has to be

summarized then it will consume lot of time and work force.

This situation lead to birth of different approaches in

summarization. This paper addresses and concentrates on

various methods and approaches and their results in

abstractive text summarization. This survey gives an insight

about different types of text summarization and various

methods used in abstractive text summarization in recent

developments.

Key words : abstractive summarization, decoder, encoder,

multi document summarization

1. INTRODUCTION

Summarization is very well useful to us in today’s world.

The main aim of abstractive text summarization is to produce

shortened version of input text with relevant meaning[7]. The

adjective abstractive is utilized because it denotes that the

generated summary is not a combination or selection of some

repeated sentences, but it a paraphrasing of core contents of

the input document [8]. Abstractive summarization is a very

difficult problem apart from Machine translation. The main

challenge in ATS is to compress the matter of input document

in an optimized way so that the main concepts of the

document are not missed [8]. In current technologically

advancing world, volumes of data is increasing and it is very

difficult to read the required data in short time[6]. It is a pretty

task to collect the required information and then convert into

summarized form. Therefore, text summarization came into

demand. Summarized text saves time and helps in avoiding

retrieving massive text. Abstractive Text summarization can

be combined with numerous intelligent systems on the basis

of NLP technologies like information retrieval, question

answering, and text classification to find the particular

information [9]. If latent structure information of the

summaries can be incorporated into abstractive

summarization model, then the quality of summaries

generated can be improved [10]. In some research works,

topic models are used to capture the latent information from

the input paragraph or documents. Despite having many

hurdles abstractive text summarization faces core issues like

(i) Neural sequence-to-sequence models which try to produce

generic summary, which include mostly used phrases (ii) The

generated summaries are less readable and are not

grammatically perfect [11]. Summarization is divided into

following types: (a) Extractive text summarization (b)

Abstractive text summarization [6]. Extractive

summarization extracts the frequently used or only precise

phrases without modifying them and generates the summary.

Whereas abstractive summarization generates new sentences

and also optimally decreases the length of the document.

Abstractive is better and qualitative than extractive as it takes

data from multiple documents and then generate precise

information of summary. Abstractive summarization is again

achieved in two ways. They are: (a) Structure based approach

(b) Semantic based approach. Neural network models on the

basis of encoder decoder for machine translation achieved

good ROGUE scores [12]. Abstractive approaches generate

summary similar to summary generated by humans but they

are more expensive [13]. On the basis of current state of RNN

in Attentive RNN the encoder computes score over the input

sentences [14]. The main problem in ATS are (a) Long

document summarization (b) Abstractive metric (c)

Controlling output length. F1 scores are evaluated generally

using ROUGE metrics [15]. Recall-Oriented Understudy for

Gisting Evaluation (ROUGE) metric was proposed by (Lin,

2004) [24]. Named Entity Recognition is also one of the core

application in NLP which helps in removing ambiguity [28].

Information Retrieval is also highly difficult and it requires

quality documents[37].

2. SURVEY

2.1 Semantic Link Network For Summarization[1]:

SLN is a semantics self-formulated for semantically

organizing resources to support advanced information

services like Abstractive Text Summarization [1]. According

the author the semantic link network, which is used in

Abstractive text summarization, has following important

components:

Survey on Abstractive Text Summarization using various approaches

Arun Krishna Chitturi

, Saravanakumar Kandaswamy

Vellore Institute of Technology, Vellore, India, chitturiarunkrishna@gmail.com

Professor at Vellore Institute of Technology, Vellore, India, ksaravanakumar@vit.ac.in

ISSN 2278-

3091

Volume 8, No.6, November – December 2019

International Journal of Advanced Trends in Computer Science and Engineering

Available Online at http://www.warse.org/IJATCSE/static/pdf/file/ijatcse45862019.pdf

https://doi.org/10.30534/ijatcse/2019/45862019

Arun Krishna Chitturi et al., International Journal of Advanced Trends in Computer Science and Engineering, 8(6), November - December 2019, 2956- 2964

2957

a) SLN construction

(i) Concept extraction with relation identification

(ii) Event extraction with relation identification

 Event trigger extraction

 Event argument extraction

 Event relation extraction

b) Semantic link network summarization

a) Semantic link network construction

SLN construction involves the following two important

components

(i) Concept extraction with relation identification

Events, concepts are considered as two main roots or base

units of information present in the documents. The relation

present between concepts carries critical information between

events. The main advantage of using concepts and their

relations in Semantic link network (SLN) is that few events

with indirect relations can be connected easily.

Here the relation between the concepts are nothing but the

phrases between the concepts. The event triggers are the one’s

which are verbs. Some valid syntactic patterns are used to

differentiate between the event triggers i.e. the verbs. Some

illustrative syntactic patterns which ae used are “be”, “be”-

Noun Phrase - Preposition, “be”- Adjective - Preposition.

(ii) Event extraction with relation identification

Event extraction with relation identification is also called

as Frame net based event extraction. Some pre-defined event

schemes are present, and with the help of these schemes

structured event information is extracted. Many present

approaches rely on Automatic Constant Extraction (ACE),

which characterizes eight types of events with thirty-three

Subtypes in total. Event is a mixture or combination of

Automatic Constant Extraction (ACE), which characterizes

eight types of events with thirty-three Subtypes in total. Frame

net corpus consists of entire interpretations of Semantic

frames along with the relation between the semantic frames.

Each formal statement is considered as a frame. Many frames

together with a hierarchy complete the event schemes. Event

extraction consists of following components

 Event trigger extraction

In this step all event triggers are identified and then the event

types are also classified. In a given sentence. A log-linear

model is utilized in event type classification. In a particular

sentence, X = {x1,x2,…xn} with the triggers T = {t1,t2,..tn},

ti denotes ith trigger word, til denotes Lemma.

P(f | ti, X) = [exp(ѲT g(f,ti,x))] / [Σf ∈ Fi exp(ѲT g(f,ti,x))]

 Event argument extraction

In this step, concepts which act as arguments are identified

and their argument roles are classified. This constitutes for

event trigger extraction.

 Event relation extraction

The structure of the sentences and their features are weighed

to ascertain the relation between the events. Some of the

common categories of semantic links between the events are

Condition link, Sequential link, Attribution link.

b) Semantic link network summarization

The summary that is extracted should contain most important

events and concepts. It should also be semantically coherent.

For achieving this , we increase the saliency scores of selected

concepts and events. Let us consider E and C which will be

denoting all types of event nodes and concept nodes. Here E is

unique concept and C is saliency.

Σe ∈ E ѲT f(e) + Σc ∈ C ΨT g(c)

f(e) - Features of Event e

g(c) – Features of Concept c

2.2 Improved Semantic Graph approach For

Summarization [2]:

We know that two approaches are there for summarization.

They are :

(i) Extractive Text Summarization

(ii) Abstractive Text summarization

For achieving abstractive text summarization, there are 2

approaches, they are

(i) Linguistic approach

(ii) Semantic approach.

Usually the graph based approach requires human

intervention and it is also specified or constrained to one

domain. It can’t be used for other domains. Naïve Bayes is

supervised algorithm and is known for it’s robustness [48].

But then it also requires human intervention.

The author proposes a semantic graph based method for Multi

document abstractive summarization(MDAS). The proposed

graph based ranking algorithm is improvised by using

Predictive Argument Structure (PAS) semantic similarity and

2 types of semantic relationships. Integrating the semantic

similarity will be helpful in determining the relation between

PAS and is also helpful in detecting redundancy. This

approach has the following main components.

(i) Creation of Semantic graph

a. Semantic role labelling

This is the first step and in this stage, each sentence is

parsed and Predictive Argument Structure(PAS) is extracted

from them. Multiple documents are segmented into bunch of

sentences. Now every sentence is given with a key, which is

based on location and time of the sentence. A SENNA which

is a semantic role parser is utilized to perform semantic text

analysis in Abstractive Text Summarization. It also decides

PAS from sentence by labelling semantic phrases or also

called semantic arguments. These semantic arguments are

classified as (a) Core arguments (b) Adjunctive arguments.

b. Semantic similarity matrix

Now in this stage, semantic similarity scores of PAS are

calculated in pairs. Based on these semantic similarity scores

a matrix is built. Verb, Location, Noun and Time arguments

of each PAS are differentiated or related with other PAS to

find out pair wise similarities.

First Jiang’s measure finds semantic distance of the

concepts.

剩余8页未读，继续阅读

评论收藏

内容反馈

版权申诉

Fun_He

粉丝: 19
资源: 104

【综述】文本摘要.pdf

综述：文本摘要.pdf

自动文本摘要研究综述.docx

一种融合信息选择和语义关联的文本摘要模型.pdf

基于深度学习的自动文本摘要.pdf

基于深度学习的文本自动生成技术研究综述.pdf

预训练语言模型的应用综述.pdf

基于C-R神经网络的生成式自动摘要方法.pdf

基于深度学习的文本自动摘要方案.pdf

基于聚类与深度学习的自监督文本摘要方法.pdf

基于依存句法分析的多主题文本摘要研究.pdf

生成式自动文摘的深度学习方法综述.pdf

汉语分词技术综述.pdf

视频数据挖掘技术综述.pdf

综述报告模板.rar

python实现的一个中文文本摘要程序.pdf

自动文本摘要研究综述_李金鹏1

论文研究-融合句子情感和主题相似性的中文新闻文本情感摘要.pdf

从Word2Vec到BERT：上下文嵌入 (Contextual Embedding) 最新综述论文.pdf

多模态深度学习综述.pdf

文本自动生成研究进展与趋势.pdf

最新资源