AHierarchicalNeuralAutoencoderforParagraphsandDocuments资源-CSDN文库

需积分: 10 153 浏览量 2018-04-19 10:50:35 上传评论收藏 301KB PDF 举报

### 层次神经自编码器在段落与文档表示中的应用 #### 标题解析：层次神经自编码器在段落与文档中的应用 - **层次神经自编码器**（Hierarchical Neural Autoencoder）是一种深度学习模型，它通过多层次结构来处理自然语言文本。 - **段落与文档**是该模型的主要处理对象。这意味着该模型旨在理解和表示包含多个句子的段落或更长的文档。 #### 描述解析：层次神经自编码器在段落与文档中的应用 - **自然语言生成**：该模型致力于解决如何生成连贯的长文本问题。 - **循环神经网络**（Recurrent Networks）通常用于处理序列数据，但生成连贯的长文本仍是一大挑战。 - **长短期记忆网络**（Long Short-Term Memory, LSTM）：一种特殊的循环神经网络，能够记住长期依赖性，适用于处理长文本数据。 - **层次建模**：该模型通过从单词到句子再到段落的层级结构来构建文本表示。 - **嵌入（Embedding）**：模型首先为每个单词创建嵌入，然后将这些嵌入组合成句子级嵌入，最终形成段落级嵌入。 - **重构原文**：模型利用编码得到的嵌入解码回原文，以验证其能否保留原始文本的语法、语义和语篇连贯性。 - **评估指标**：使用ROUGE和Entity Grid等标准指标来评价重构的段落质量。 #### 关键知识点： 1. **自然语言生成的挑战**： - 自然语言生成的目标是生成人类可读且连贯的文本。 - 长文本生成面临的挑战包括：保持文本的语法正确性、语义一致性和语篇连贯性。 2. **层次神经自编码器的设计原理**： - **编码阶段**：从词到句再到段落的层次化编码过程。 - 单词嵌入：使用预训练的词向量或模型内部生成的词向量。 - 句子表示：通过LSTM网络对每个句子进行编码，获得句子级表示。 - 段落表示：再通过LSTM网络对句子表示进行编码，形成段落级表示。 - **解码阶段**：将段落表示解码回原始文本。 - 使用LSTM作为解码器，根据段落表示重构出原文本。 3. **模型优势**： - **语法、语义和语篇连贯性**：通过层次化编码和解码，模型能够较好地保留原始文本的关键属性。 - **可扩展性**：该模型能够应用于不同长度和复杂度的文本。 - **适应性**：适合于多种自然语言处理任务，如文本生成、摘要提取等。 4. **评估方法**： - **ROUGE**：一种自动评估文本相似性的工具，主要用于评估生成文本与参考文本之间的相似程度。 - **Entity Grid**：一种用于评估实体信息完整性和准确性的方法。 5. **潜在应用领域**： - **文本生成**：生成新闻报道、故事创作等。 - **文本摘要**：从长文中提取关键信息生成摘要。 - **对话系统**：提升对话系统的连续性和连贯性。 - **机器翻译**：提高翻译质量，特别是在保持语义连贯性方面。通过层次神经自编码器的研究，我们可以看到神经网络模型在处理自然语言文本时的强大能力，尤其是对于长文本的处理和生成。这一研究不仅为自然语言生成提供了新的视角和技术手段，也为后续相关领域的研究和发展奠定了基础。

资源推荐

资源详情

资源评论

Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics

and the 7th International Joint Conference on Natural Language Processing, pages 1106–1115,

Beijing, China, July 26-31, 2015.

2015 Association for Computational Linguistics

A Hierarchical Neural Autoencoder for Paragraphs and Documents

Jiwei Li, Minh-Thang Luong and Dan Jurafsky

Computer Science Department, Stanford University, Stanford, CA 94305, USA

jiweil, lmthang, jurafsky@stanford.edu

Abstract

Natural language generation of coherent

long texts like paragraphs or longer doc-

uments is a challenging problem for re-

current networks models. In this paper,

we explore an important step toward this

generation task: training an LSTM (Long-

short term memory) auto-encoder to pre-

serve and reconstruct multi-sentence para-

graphs. We introduce an LSTM model that

hierarchically builds an embedding for a

paragraph from embeddings for sentences

and words, then decodes this embedding

to reconstruct the original paragraph. We

evaluate the reconstructed paragraph us-

ing standard metrics like ROUGE and En-

tity Grid, showing that neural models are

able to encode texts in a way that preserve

syntactic, semantic, and discourse coher-

ence. While only a ﬁrst step toward gener-

ating coherent text units from neural mod-

els, our work has the potential to signiﬁ-

cantly impact natural language generation

and summarization

1 Introduction

Generating coherent text is a central task in natural

language processing. A wide variety of theories

exist for representing relationships between text

units, such as Rhetorical Structure Theory (Mann

and Thompson, 1988) or Discourse Representa-

tion Theory (Lascarides and Asher, 1991), for ex-

tracting these relations from text units (Marcu,

2000; LeThanh et al., 2004; Hernault et al., 2010;

Feng and Hirst, 2012, inter alia), and for extract-

ing other coherence properties characterizing the

role each text unit plays with others in a discourse

(Barzilay and Lapata, 2008; Barzilay and Lee,

Code for models described in this paper are available at

www.stanford.edu/

jiweil/.

2004; Elsner and Charniak, 2008; Li and Hovy,

2014, inter alia). However, applying these to text

generation remains difﬁcult. To understand how

discourse units are connected, one has to under-

stand the communicative function of each unit,

and the role it plays within the context that en-

capsulates it, recursively all the way up for the

entire text. Identifying increasingly sophisticated

human-developed features may be insufﬁcient for

capturing these patterns. But developing neural-

based alternatives has also been difﬁcult. Al-

though neural representations for sentences can

capture aspects of coherent sentence structure (Ji

and Eisenstein, 2014; Li et al., 2014; Li and Hovy,

2014), it’s not clear how they could help in gener-

ating more broadly coherent text.

Recent LSTM models (Hochreiter and Schmid-

huber, 1997) have shown powerful results on gen-

erating meaningful and grammatical sentences in

sequence generation tasks like machine translation

(Sutskever et al., 2014; Bahdanau et al., 2014; Lu-

ong et al., 2015) or parsing (Vinyals et al., 2014).

This performance is at least partially attributable

to the ability of these systems to capture local

compositionally: the way neighboring words are

combined semantically and syntactically to form

meanings that they wish to express.

Could these models be extended to deal with

generation of larger structures like paragraphs or

even entire documents? In standard sequence-

to-sequence generation tasks, an input sequence

is mapped to a vector embedding that represents

the sequence, and then to an output string of

words. Multi-text generation tasks like summa-

rization could work in a similar way: the sys-

tem reads a collection of input sentences, and

is then asked to generate meaningful texts with

certain properties (such as—for summarization—

being succinct and conclusive). Just as the local

semantic and syntactic compositionally of words

can be captured by LSTM models, can the com-

1106

positionally of discourse releations of higher-level

text units (e.g., clauses, sentences, paragraphs, and

documents) be captured in a similar way, with

clues about how text units connect with each an-

other stored in the neural compositional matrices?

In this paper we explore a ﬁrst step toward this

task of neural natural language generation. We fo-

cus on the component task of training a paragraph

(document)-to-paragraph (document) autoencoder

to reconstruct the input text sequence from a com-

pressed vector representation from a deep learn-

ing model. We develop hierarchical LSTM mod-

els that arranges tokens, sentences and paragraphs

in a hierarchical structure, with different levels of

LSTMs capturing compositionality at the token-

token and sentence-to-sentence levels.

We offer in the following section to a brief de-

scription of sequence-to-sequence LSTM models.

The proposed hierarchical LSTM models are then

described in Section 3, followed by experimental

results in Section 4, and then a brief conclusion.

2 Long-Short Term Memory (LSTM)

In this section we give a quick overview of LSTM

models. LSTM models (Hochreiter and Schmid-

huber, 1997) are deﬁned as follows: given a

sequence of inputs X = {x

, x

, ..., x

}, an

LSTM associates each timestep with an input,

memory and output gate, respectively denoted as

, f

and o

. For notations, we disambiguate e and

h where e

denote the vector for individual text

unite (e.g., word or sentence) at time step t while

denotes the vector computed by LSTM model

at time t by combining e

and h

t−1

. σ denotes the

sigmoid function. The vector representation h

for

each time-step t is given by:

tanh

W ·

t−1

(1)

= f

· c

t−1

+ i

· l

(2)

= o

· c

(3)

where W ∈ R

4K×2K

In sequence-to-sequence

generation tasks, each input X is paired with

a sequence of outputs to predict: Y =

, y

, ..., y

}. An LSTM deﬁnes a distribution

over outputs and sequentially predicts tokens us-

ing a softmax function:

P (Y |X)

t∈[1,n

]

p(y

, x

, ..., x

, y

, ..., y

t−1

)

t∈[1,n

]

exp(f(h

t−1

, e

))

exp(f(h

t−1

, e

))

(4)

f(h

t−1

, e

) denotes the activation function be-

tween e

h−1

and e

, where h

t−1

is the representa-

tion outputted from the LSTM at time t − 1. Note

that each sentence ends up with a special end-of-

sentence symbol <end>. Commonly, the input

and output use two different LSTMs with differ-

ent sets of convolutional parameters for capturing

different compositional patterns.

In the decoding procedure, the algorithm termi-

nates when an <end> token is predicted. At each

timestep, either a greedy approach or beam search

can be adopted for word prediction. Greedy search

selects the token with the largest conditional prob-

ability, the embedding of which is then combined

with preceding output for next step token predic-

tion. For beam search, (Sutskever et al., 2014) dis-

covered that a beam size of 2 sufﬁces to provide

most of beneﬁts of beam search.

3 Paragraph Autoencoder

In this section, we introduce our proposed hierar-

chical LSTM model for the autoencoder.

3.1 Notation

Let D denote a paragraph or a document, which

is comprised of a sequence of N

sentences,

D = {s

, s

, ..., s

, end

}. An additional

”end

” token is appended to each document.

Each sentence s is comprised of a sequence of

tokens s = {w

, w

, ..., w

} where N

denotes

the length of the sentence, each sentence end-

ing with an “end

” token. The word w is as-

sociated with a K-dimensional embedding e

= {e

, e

, ..., e

}. Let V denote vocabu-

lary size. Each sentence s is associated with a K-

dimensional representation e

An autoencoder is a neural model where output

units are directly connected with or identical to in-

put units. Typically, inputs are compressed into

a representation using neural models (encoding),

which is then used to reconstruct it back (decod-

ing). For a paragraph autoencoder, both the input

X and output Y are the same document D. The

1107

剩余9页未读，继续阅读

评论收藏

内容反馈

basketfox

粉丝: 0
资源: 10

A Hierarchical Neural Autoencoder for Paragraphs and Documents

Hierarchical Neural Networks for Image Interpretation

Hierarchical Convolutional Neural Networks for EEG-Based Emotion Recognition

Hierarchical Graph Models for Conflict Resolution 2015

Integrating Context and Occlusion for Car Detection by Hierarchical And-Or Model

Hierarchical-Neural-Autoencoder

AEC论文解读 - A Deep Hierarchical Fusion Network for Fullband Acoustic Echo Cancellation

向量图形基元分层分组的递归神经网络重组_ReGroup Recursive Neural Networks for Hierar

Hierarchical-Attention-Network:具有注意机制的分级RNN的实现，用于使用评论来预测收视率

A Survey of the Recent Architectures of Deep Convolutional Neural Networks.pdf

Hashed and Hierarchical Timing Wheels

Learning Skeletal Graph Neural Networks for Hard 3D Pose.pdf

Hierarchical Consensus Hashing for Cross-Modal Retrieval

高信龙一__A Hierarchical Framework for Relation Extraction with Rein

Pattern Recognition using Neural and Functional Networks(Springer2009新书)

A hierarchical commitment algorithm for permanent time stamp ordering

Hierarchical organization and structural flexibility

Deterministic Variational Inference for Robust Bayesian Neural N

Pattern Recognition with Neural Networks in C++

2019-MINCUT POOLING IN GRAPH NEURAL NETWORKS-网文-rrrr1

使用自编码器与lstm预测金融时间序列

Prediction as a candidate for learning deep hierarchical models of data

Conv-DBN for Scalable Unsupervised Learning of Hierarchical Representations

Neural Sentiment Classification with User and Product Attention

tf-hierarchical-rnn

Packt.MVVM.Survival.Guide.for.Enterprise.Architectures.in.Silverlight.And.WPF

multi-label-classification.pdf

S88_Batch Placemat.pdf

R.for.Marketing.Research.and.Analytics.3319144359

Laser Scanning Systems in Highway and Safety Assessment- Using LiDAR (2020）.pdf

Convolutional-Recursive Deep Learning for 3D Object Classification

最新资源