Surfer100GeneratingSurveysFromWebResourcesonWikipedia-sty资源-CSDN文库

版权申诉

141 浏览量 2022-01-04 14:47:44 上传评论收藏 151KB PDF 举报

资源详情

资源评论

Surfer100: Generating Surveys From Web Resources on Wikipedia-style

Irene Li, Alexander Fabbri, Rina Kawamura, Yixin Liu, Xiangru Tang,

Jaesung Tae, Chang Shen, Sally Ma, Tomoe Mizutani, Dragomir Radev

Department of Computer Science

Yale University

Abstract

Fast-developing ﬁelds such as Artiﬁcial Intelli-

gence (AI) often outpace the efforts of encyclo-

pedic sources such as Wikipedia, which either

do not completely cover recently-introduced

topics or lack such content entirely. As a re-

sult, methods for automatically producing con-

tent are valuable tools to address this informa-

tion overload. We show that recent advances

in pretrained language modeling can be com-

bined for a two-stage extractive and abstrac-

tive approach for Wikipedia lead paragraph

generation. We extend this approach to gen-

erate longer Wikipedia-style summaries with

sections and examine how such methods strug-

gle in this application through detailed stud-

ies with 100 reference human-collected sur-

veys. This is the ﬁrst study on utilizing web

resources for long Wikipedia-style summaries

to the best of our knowledge.

1 Introduction

Novel concepts are being introduced and evolv-

ing at a rate that makes creating high-quality, up-

to-date Wikipedia pages for such topics challeng-

ing. A pipeline for automatically creating such

Wikipedia pages is thus desirable. While there

has been some work on generating full Wikipedia

pages, these efforts are either domain-speciﬁc

(Sauper and Barzilay, 2009), making strong as-

sumptions about the topics being summarized

(Banerjee and Mitra, 2016), or are purely extractive

(Jha et al., 2015). In a related line of work, query-

based summarization has been applied to speciﬁc

sections of Wikipedia pages Deutsch and Roth

(2019); Zhu et al. (2019), which can be viewed as a

more self-contained version of Wikipedia page gen-

eration. Recent Wikipedia page generation work

has focused on generating the initial leading para-

graph of a Wikipedia page (Liu et al., 2018; Liu

and Lapata, 2019; Perez-Beltrachini et al., 2019).

These papers consist of a two-step framework by

which an extractive method selects relevant con-

tent for a speciﬁc topic, and an abstractive method

generates the ﬁnal summary of the topic.

In this paper, we ﬁrst examine how recently-

introduced pretrained language models (Devlin

et al., 2019; Liu et al., 2019; Lewis et al., 2019)

improve upon both the extractive and abstractive

steps of previous models for the task of lead para-

graph generation. We further focus on analyzing

the extension of such methods to full Wikipedia

page generation on scientiﬁc topics related to AI

and Natural Language Processing (NLP). We man-

ually create summaries of 100 AI and NLP topics

divided along sections, as on Wikipedia pages. We

perform ablation studies on content selection and

generation methods over selected topics, ﬁnding

that current content selection methods are not pre-

cise and fail to differentiate content well among

queries for subtopics of the main topic.

Our contributions are: 1) We demonstrate how

recent advances in pretrained language models im-

prove upon Wikipedia lead paragraph generation.

2) We then extend such a method to generate full

Wikipedia-style pages of scientiﬁc topics; 3) For a

testing purpose, we manually collected Surfer100,

100 SURveys From wEb Resources on scientiﬁc

topics, ﬁlling the gap on human-written surveys us-

ing web resources in scientiﬁc topics. We provide

a better understanding of current methods and their

faults on a real-world application.

2 Wikipedia Lead Paragraph Generation

In this section, we show how combining recent

methods for a two-staged approach of content se-

lection and generation give improved results on

the WikiSum dataset (Liu et al., 2018) as well as a

newly curated set of Wikipedia articles.

2.1 Data

We make use of the

WikiSum

dataset (Liu et al.,

2018), a collection of over 1.5 million Wikipedia

arXiv:2112.06377v1 [cs.CL] 13 Dec 2021

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余6页未读，立即下载

评论收藏

内容反馈

版权申诉

Surfer100 Generating Surveys From Web Resources on Wikipedia-sty

评论0

最新资源

Surfer100 Generating Surveys From Web Resources on Wikipedia-sty

评论0

最新资源

相关推荐

surfer8中文破解版

surfer 9.11 汉化.part1

surfer 9.11汉化版.part2

surfer9.11汉化.part3

surfer 教程-ppt的

Surfer---九种插值方法.doc

surfer 9.7.543-patch.exe

test1.rar_matlab调用surfer_matlab调用surfer批量绘图_surfer 批量画图_等值线

计算机绘图---surfer教程汇总

surfer二次开发-java调用exe

Surfer8汉化版

【经典】Surfer8.0软件超详细使用教程.doc

全国海岸线适用于surfer底图

surfer算法

surfer教程

surfer调用演示代码

Surfer 8 初学者中文参考手册

Surfer8汉化注册版

surfer8使用说明

Cobalt Strike下载

北京邮电大学计算机考研复试笔试资料

计算机系统-笔记-HUN2021级

cs1.6老版本供下载

合成孔径雷达的经典成像算法cs(matlab)仿真代码（吐血整理，内容全，注释全）

港大CS（MSC）面试整理

合成孔径雷达RD CS OmegaK算法点目标仿真.rar

计算机科学导论原书第二版答案.zip

Cobalt-Strike-4.5

cobaltstrike4.3.zip