01-OvercomingCatastrophicForgettingForEvent-Centric.rar

共1个文件

pdf：1个

版权申诉

129 浏览量 2023-10-19 10:45:45 上传评论收藏 584KB RAR 举报

《克服灾难性遗忘：事件中心视角》在人工智能领域，深度学习模型已经在许多任务中取得了显著的成就，尤其是在自然语言处理、计算机视觉和强化学习等领域。然而，这些模型面临着一个严重的问题——“灾难性遗忘”（Catastrophic Forgetting）。当模型在连续的学习过程中，新任务的学习往往会导致对旧任务的性能急剧下降，这被称为灾难性遗忘。本资料主要探讨如何在事件中心的视角下克服这一问题。 “灾难性遗忘”是由于神经网络在训练新数据时，会调整权重以优化新任务的表现，而忽视了之前学到的知识。这在多任务学习和在线学习环境中尤为明显，因为模型需要不断地适应新的输入，而不能完全依赖于过去的训练数据。解决灾难性遗忘的一种方法是“弹性权重固化”（Elastic Weight Consolidation, EWC），它通过对模型权重施加额外的正则化项来保护重要的参数。这些重要参数是根据旧任务的表现来确定的，以防止在学习新任务时过度更新。EWC通过计算每个参数对旧任务损失的敏感度来量化其重要性，然后在训练新任务时限制这些重要参数的更新幅度。另一种策略是“在线领域适应”（Online Domain Adaptation, ODA），它允许模型在接收新数据流的同时，调整其表示以适应新任务，同时保持对旧任务的性能。ODA通常结合了迁移学习，将从旧任务学到的知识迁移到新任务中，以减少遗忘。事件中心视角强调了对事件序列的理解和记忆，这对于理解和处理具有时间依赖性的任务至关重要，如时间序列预测或事件推理。在这样的框架下，可以利用事件记忆机制，比如利用循环神经网络（RNNs）或门控循环单元（GRUs）来存储和检索过去的事件信息，从而避免遗忘。此外，还可以采用“多头注意力”机制，让模型同时关注多个任务或事件，而不是只关注当前任务。这有助于在学习新信息时保持对旧信息的关注，从而缓解遗忘问题。 “知识图谱”（KG）的引入也是解决灾难性遗忘的有效途径。知识图谱可以作为长期记忆存储，包含丰富的实体和关系，使得模型可以在处理新任务时参考和利用过去的知识，从而减少对训练数据的依赖。克服灾难性遗忘需要综合运用多种策略，包括权重保护、在线适应、事件记忆以及知识图谱的利用。通过这些方法，我们可以构建更加健壮和适应性强的AI系统，使其在不断变化的环境中持续学习，而不会轻易遗忘已经学到的知识。

资源推荐

资源详情

资源评论

收起资源包目录

01- Overcoming Catastrophic Forgetting For Event-Centric.rar （1个子文件）

01- Overcoming Catastrophic Forgetting For Event-Centric.pdf 729KB

History Repeats: Overcoming Catastrophic Forgetting For Event-Centric

Temporal Knowledge Graph Completion

Mehrnoosh Mirtaheri Mohammad Rostami Aram Galstyan

Information Sciences Institute University of Southern California

mirtaheri@usc.edu {rostami, galstyan}@isi.edu

Abstract

Temporal knowledge graph (TKG) completion

models typically rely on having access to the

entire graph during training. However, in real-

world scenarios, TKG data is often received

incrementally as events unfold, leading to a

dynamic non-stationary data distribution over

time. While one could incorporate ﬁne-tuning

to existing methods to allow them to adapt to

evolving TKG data, this can lead to forget-

ting previously learned patterns. Alternatively,

retraining the model with the entire updated

TKG can mitigate forgetting but is computa-

tionally burdensome. To address these chal-

lenges, we propose a general continual training

framework that is applicable to any TKG com-

pletion method, and leverages two key ideas:

(i) a temporal regularization that encourages

repurposing of less important model param-

eters for learning new knowledge, and (ii) a

clustering-based experience replay that rein-

forces the past knowledge by selectively pre-

serving only a small portion of the past data.

Our experimental results on widely used event-

centric TKG datasets demonstrate the effective-

ness of our proposed continual training frame-

work in adapting to new events while reducing

catastrophic forgetting. Further, we perform

ablation studies to show the effectiveness of

each component of our proposed framework.

Finally, we investigate the relation between the

memory dedicated to experience replay and the

beneﬁt gained from our clustering-based sam-

pling strategy.

1 Introduction

Knowledge graphs (KGs) provide a powerful tool

for studying the underlying structure of multi-

relational data in the real world (Liang et al., 2022).

They present factual information in the form of

triples, each consisting of a subject entity, a relation,

and an object entity. Despite the development of

advanced extraction techniques, knowledge graphs

often suffer from incompleteness, which can lead

to errors in downstream applications. As a result,

the task of predicting missing facts in knowledge

graphs, also known as knowledge graph comple-

tion, has become crucial. (Wang et al., 2022; Huang

et al., 2022; Shen et al., 2022)

KGs are commonly extracted from real-world

data streams, such as newspaper texts that change

and update over time, making them inherently dy-

namic. The stream of data that emerges every day

may contain new entities, relations, or facts. As

a result, facts in a knowledge graph are usually

accompanied by time information. A fact in a se-

mantic knowledge graph, such as Yago (Kasneci

et al., 2009), may be associated with a time in-

terval, indicating when it appeared and remained

in the KG. For example, consider (Obama, Pres-

ident, United States, 2009-2017) in a semantic

KG. A link between Obama and United states

appears in the graph after 2009, and it exists un-

til 2017. On the other hand, a fact in a Tempo-

ral event-centric knowledge graph (TKGs), such

as ICEWS (Boschee et al., 2015), is associated

with a single timestamp, indicating the exact time

of the interaction between the subject and object

entities. For example, in an event-centric TKG,

(Obama, meet, Merkel) creates a link between

Obama and Merkel several times within 2009 to

2017 since the temporal links only show the time

when an event has occurred. Therefore, event-

centric TKGs exhibit a high degree of dynamism

and non-stationarity in contrast to semantic KGs.

To effectively capture the temporal dependen-

cies within entities and relations in TKGs, as well

as new patterns that may emerge with new data

streams, it is necessary to develop models speciﬁ-

cally designed for TKG completion. A signiﬁcant

amount of research has been dedicated to develop-

ing evolving models (Messner et al., 2022; Mir-

taheri et al., 2021; Jin et al., 2020; Garg et al.,

2020) for TKG completion. These models typi-

cally assume evolving vector representations for

arXiv:2305.18675v1 [cs.LG] 30 May 2023

0.20

0.30

0.20

0.30

0.25

0.30

0.25

0.30

Time

0.25

0.30

ﬁnetune

retrain

Reported model MRR over a graph data stream

Figure 1: Catastrophic forgetting effect of ﬁne-tuning.

A TKG completion model is ﬁne-tuned with the graph

data at time

and achieves the highest MRR score for

. The MRR scores decrease for G

, ..., G

i−1

entities or relations. These representations change

depending on the timestep, and they can capture

temporal dependencies between entities. However,

these models often assume that the entire dataset

is available during training. They do not provide a

systematic method for updating model parameters

when new data is added. One potential solution is

to retrain the model with new data. However, this

approach can be resource-intensive and impractical

for large-scale knowledge graphs. An alternative

approach is to ﬁne-tune the model with new data,

which is more time and memory efﬁcient. However,

this approach has been shown to be susceptible to

overﬁtting to the new data, resulting in the model

forgetting previously learned knowledge, a phe-

nomenon known as catastrophic forgetting (Fig. 1).

A limited number of studies (Song and Park, 2018;

Daruna et al., 2021; Wu et al., 2021) have addressed

this problem for semantic knowledge graphs using

continual learning approaches, with TIE (Wu et al.,

2021) being the most closely related work to cur-

rent research. Nevertheless, the development of

efﬁcient and effective methods for updating mod-

els with new data remains a signiﬁcant challenge

in event-centric Temporal Knowledge Graphs.

We propose a framework for incrementally train-

ing a TKG completion model that consolidates

the previously learned knowledge while capturing

new patterns in the data. Our incremental learn-

ing framework employs regularization and experi-

ence replay to alleviate catastrophic forgetting. We

propose a temporal regularization method based

on elastic weight consolidation (Kirkpatrick et al.,

2017). By estimating an importance weight for

every model parameter at each timestep, the regu-

larization term in the objective function ’freezes’

the more important parameters from past timesteps,

encouraging the use of less important parameters

for learning the current task. Additionally, an expo-

nentially decaying hyperparameter in the objective

function further emphasizes the importance of the

most recent tasks over older ones. Our selective

experience replay method uses clustering over the

representation of the data points to ﬁrst capture the

underlying structure of the data. The points closest

to the clusters’ centroid are selected for experience

replay. We show that the temporal regularization

combined with clustering-based experience replay

outperforms all the baselines in alleviating catas-

trophic forgetting. Our main contributions include:

A novel framework for incremental training

and evaluation of event-centric TKGs, which

addresses the challenges of efﬁciently updat-

ing models with new data.

A clustering-based experience replay method,

which we show to be more effective than uni-

form sample selection. We also demonstrate

that careful data selection for experience re-

play is crucial when memory is limited.

An augmentation of the training loss with a

consolidation loss, speciﬁcally designed for

TKG completion, which helps mitigate forget-

ting effects. We show that assigning a decayed

importance to the older tasks reduces forget-

ting effects.

A thorough evaluation of the proposed meth-

ods through extensive quantitative experi-

ments to demonstrate the effectiveness of our

full training strategies compared to baselines.

2 Related Work

Our work is related to TKG completion, contin-

ual learning methods, and recent developments of

continual learning for knowledge graphs.

2.1 Temporal Knowledge Graph Reasoning

TKG completion methods can be broadly catego-

rized into two main categories based on their ap-

proach for encoding time information: translation-

based methods and evolving methods.

Translation-based methods, such as those pro-

posed by (Leblay and Chekol, 2018; García-Durán

et al., 2018; Dasgupta et al., 2018; Wang and Li,

2019; Jain et al., 2020), and (Sadeghian et al.,

2021), utilize a lower-dimensional space, such as a

vector (Leblay and Chekol, 2018; Jain et al., 2020),

or a hyperplane (Dasgupta et al., 2018; Wang and

Li, 2019), for event timestamps and deﬁne a func-

tion to map an initial embedding to a time-aware

embedding.

On the other hand, evolving models assume a dy-

namic representation for entities or relations that is

updated over time. These dynamics can be captured

by shallow encoders (Xu et al., 2019; Mirtaheri

et al., 2019; Han et al., 2020a) or sequential neural

networks (Trivedi et al., 2017; Jin et al., 2020; Wu

et al., 2020; Zhu et al., 2020; Han et al., 2020b,c; Li

et al., 2021). For example,(Xu et al., 2019) model

entities and relations as time series, decomposing

them into three components using adaptive time se-

ries decomposition. DyERNIE (Han et al., 2020a)

propose a non-Euclidean embedding approach in

the hyperbolic space. (Trivedi et al., 2017) repre-

sent events as point processes, while (Jin et al.,

2020) utilizes a recurrent architecture to aggregate

the entity neighborhood from past timestamps.

2.2 Continual Learning

Continual learning (CL) or lifelong learning is a

learning setting where a set of tasks are learned

in a sequence. The major challenge in CL is over-

coming catastrophic forgetting, where the model’s

performance on past learned tasks is degraded as

it is updated to learn new tasks in the sequence.

Experience replay (Li and Hoiem, 2018) is a major

approach to mitigate forgetting, where representa-

tive samples of past tasks are replayed when up-

dating a model to retain past learned knowledge.

To maintain a memory buffer storage with a ﬁxed

size, representative samples must be selected and

discarded. (Schaul et al., 2016) propose selecting

samples that led to the maximum effect on the loss

function when learning past tasks.

To relax the need for a memory buffer, genera-

tive models can be used to learn generating pseudo-

samples. (Shin et al., 2017) use adversarial learning

for this purpose. An alternative approach is to use

data generation using autoencoders(Rostami et al.,

2020; Rostami and Galstyan, 2023a). Weight con-

solidation is another important approach to mitigate

catastrophic forgetting (Zenke et al., 2017; Kirk-

patrick et al., 2017). The idea is to identify impor-

tant weights that play an important role in encoding

the learned knowledge about past tasks and con-

solidate them when the model is updated to learn

new tasks. As a result, new tasks are learned using

primarily the free learnable weights. In our frame-

work, we combine both approaches to achieve opti-

mal performance.

2.3 Continual Learning for Graphs

CL in the context of graph structures remains an

under-explored area, with a limited number of re-

cent studies addressing the challenge of dynamic

heterogeneous networks (Tang and Matteson, 2021;

Wang et al., 2020; Zhou and Cao, 2021) and se-

mantic knowledge graphs (Song and Park, 2018;

Daruna et al., 2021; Wu et al., 2021). In partic-

ular, (Song and Park, 2018; Daruna et al., 2021)

propose methods that integrate class incremental

learning models with static translation-based ap-

proaches, such as TransE (Bordes et al., 2013), for

addressing the problem of continual KG embed-

dings. Additionally, TIE (Wu et al., 2021) develops

a framework that predominantly focuses on seman-

tic KGs, and generates yearly graph snapshots by

converting a fact with a time interval into multiple

timestamped facts. This process can cause a loss

of more detailed temporal information, such as the

month and date, and results in a substantial overlap

of over 95% between consecutive snapshots. TIE’s

frequency-based experience replay mechanism op-

erates by sampling a ﬁxed set of data points from a

ﬁxed-length window of past graph snapshots; for

instance, at a given time

, it has access to the snap-

shots from

t−1

t−5

. This contrasts with the stan-

dard continual learning practice, which involves

sampling data points from the current dataset and

storing them in a continuously updated, ﬁxed-size

memory buffer. When compared to Elastic Weight

Consolidation (EWC), the L2 regularizer used by

TIE proves to be more rigid when learning new

tasks over time. Furthermore, their method’s evalu-

ation is conﬁned to shallow KG completion models

like Diachronic Embeddings (Goel et al., 2020) and

HyTE (Dasgupta et al., 2018).

3 Problem Deﬁnition

This section presents the formal deﬁnition of con-

tinual temporal knowledge graph completion.

3.1 Temporal Knowledge Graph Reasoning

A TKG is a collection of events represented as a set

of quadruples

G = {(s, r, o, τ )|s, o ∈ E, r ∈ R}

where

and

are the set of entities and relations,

and

is the timestamp of the event occurrence.

These events represent one-time interactions be-

tween entities at a speciﬁc time. The task of tem-

poral knowledge graph completion is to predict

whether there will be an interaction between two

entities at a given time. This can be done by either

predicting the object entity, given the subject and

relation at a certain time, or by predicting the rela-

tion between entities, given the subject and object

at a certain time. In this case, we will focus on

the ﬁrst method which can be formally deﬁned as

a ranking problem. The model will assign higher

likelihood to valid entities and rank them higher

than the rest of the candidate entities.

3.2 Continual Learning Framework For

Tempporal Knolwedge Graphs

A Temporal knowledge graph

can be represented

as a stream of graph snapshots

, G

, . . . , G

arriving over time, where

= {(s, r, o, τ )|s, o ∈

E, r ∈ R, τ ∈ [τ

, τ

t+1

)}

is a set of events occurred

within time interval [τ

, τ

t+1

The continual training of a TKG completion

method involves updating the parameters of the

model

as new graph snapshots, consisting of

a set of events, become available over time. This

process aims to consolidate previously acquired

information while incorporating new patterns. For-

mally, we deﬁne a set of tasks

⟨T

, . . . , T

⟩

, where

each task



train

, D

test

, D

val



is comprised

of disjoint subsets of the

events, created through

random splitting. A continually trained model

can then be shown as a stream of models

M = ⟨M

, . . . , M

⟩

, with corresponding param-

eter sets

θ = ⟨θ

, θ

, ..., θ

⟩

, trained incrementally

as a stream of tasks arrive T = ⟨T

, T

, ..., T

⟩.

3.3 Base Model

In this paper, we utilize RE-NET (Jin et al., 2020),

a state-of-the-art TKG completion method, as the

base model. RE-NET is a recurrent architecture

for predicting future interactions, which models

the probability of an event occurrence based on

temporal sequences of past knowledge graphs. The

model incorporates a recurrent event encoder to

process past events and a neighborhood aggrega-

tor to model connections at the same time stamp.

Although RE-NET was initially developed for pre-

dicting future events (extrapolation), it can also be

used to predict missing links in the current state of

the graph (interpolation), which is the focus of this

study. The model parameterizes the probability of

an event p(o

|s, r) as follows:

p(o

|s, r) ∝ exp



: e

: h

τ −1

(s, r)]

⊤

· w



, (1)

where

, e

∈ R

are learnable embedding

vectors for the subject entity

and relation

τ −1

(s, r) ∈ R

represents the local dynamics

within a time window

(τ − ℓ, τ − 1)

for

(s, r)

. By

combining both the static and dynamic representa-

tions, RE-NET effectively captures the semantics

(s, r)

up to time stamp

(τ − 1)

. The model then

calculates the probability of different object entities

by passing the encoding through a multi-layer

perceptron (MLP) decoder, which is deﬁned as a

linear softmax classiﬁer parameterized by w

4 Methodology

Our proposed framework is a training approach that

can be applied to any TKG completion model. It en-

ables the incremental updating of model parameters

with new data while addressing the issues of catas-

trophic forgetting associated with ﬁne-tuning. To

achieve this, we utilize experience replay and reg-

ularization techniques - methodologies commonly

employed in image processing and reinforcement

learning to mitigate forgetting. Additionally, we

introduce a novel experience replay approach that

employs clustering to identify and select data points

that best capture the underlying structure of the

data. Furthermore, we adopt the regularization

method of EWC, as proposed in [Kirkpatrick et

al., 2017], which incorporates a decay parameter

that assigns higher priority to more recent tasks.

Our results demonstrate that the incorporation of a

decay parameter into the EWC loss and prioritizing

more recent tasks leads to improved performance.

4.1 Experience Replay

In the ﬁeld of neuroscience, the hippocampal re-

play, or the re-activation of speciﬁc trajectories, is

a crucial mechanism for various neurological func-

tions, including memory consolidation. Motivated

by this concept, the use of experience replay in Con-

tinual Learning (CL) for deep neural networks aims

to consolidate previously learned knowledge when

a new task is encountered by replaying previous ex-

periences, or training the model on a limited subset

of previous data points. However, a challenge with

experience replay, also known as memory-based

methods, is the requirement for a large memory

size to fully consolidate previous tasks (Rostami

and Galstyan, 2023b). Thus, careful selection of

data points that effectively represent the distribu-

tion of previous data becomes necessary.

In this work, we propose the use of experience

replay for continual TKG completion. Speciﬁ-

cally, we maintain a memory buffer

which, at

time

, contains a subset of events sampled from

train

, D

train

, . . . , D

train

t−1

. When Task

is pre-

sented to the model, it is trained on the data points

train

∪ B

. After training, a random subset

of events in the memory buffer,

|B|

, are discarded

and replaced with a new subset of events sampled

from

train

. In this way, at time

, where

tasks

have been observed, equal portions of memory with

size

|B|

are dedicated to each task. A naive ap-

proach for selecting a subset of events from a task’s

training set at time

would be to uniformly sam-

ple

|B|

events from

train

. However, we propose

a clustering-based sampling method that offers a

more careful selection algorithm, which is detailed

in the following section.

4.1.1 Clustering-based Sampling

When dealing with complex data, it is likely that

various subspaces exist within the data that must be

represented in the memory buffer. To address this

issue, clustering methods are employed to diversify

the memory buffer by grouping data points into dis-

tinct clusters. The centroids of these clusters can

be utilized as instances themselves or as representa-

tives of parts of the memory buffer.(Shi et al., 2018;

Hayes et al., 2019; Korycki and Krawczyk, 2021).

In this study, clustering is applied to the representa-

tion of events in the training set in order to uncover

the underlying structure of the data and select data

points that effectively cover the data distribution.

The Hierarchical Density-Based Spatial Cluster-

ing of Applications with Noise (HDBSCAN) algo-

rithm (McInnes et al., 2017) is utilized for this pur-

pose. HDBSCAN is a hierarchical, non-parametric,

density-based clustering method that groups points

that are closely packed together while identifying

points in low-density regions as outliers.

The use of HDBSCAN over other clustering

methods is advantageous due to its minimal require-

ments for hyperparameters. Many clustering algo-

rithms necessitate and are sensitive to the number

of clusters as a hyperparameter. However, HDB-

SCAN can determine the appropriate number of

clusters by identifying and merging dense space

regions. Additionally, many clustering algorithms

are limited to ﬁnding only spherical clusters. HDB-

SCAN, on the other hand, is capable of uncovering

more complex underlying structures in the data. As

Algorithm 1: Cluster Experience Replay

input: C

= C

, C

, . . . , C

(clusters

generated with hdbscan from D

train

sorted in decreasing order of their

size;

train

(training set at time

);

(sample size);

FindExemplars(C

, k) (Takes a

cluster and returns k points closests

to the cluster exemplars.)

1 def SelectPoints(C

, D

train

, s):

2 Q ← ∅

3 for i ← 1 to m do

4 r ← ⌈

× s⌉

5 X ← FindExemplars(C

, r)

6 Q ← Q ∪ (X, r)

7 S ← ∅

8 while Q = ∅ & |S| < s do

9 X, r ← Q.pop()

10 S ← S ∪ [X[0]]

11 Q ← Q ∪ (X[1 :], r − 1)

12 return S

a result of its ability to identify clusters with off-

shaped structures, HDBSCAN generates a set of

exemplar points for each cluster rather than a single

point as the cluster centroid.

We represent each event

(s, r, o, τ ) ∈ D

train

a vector

: e

] ∈ R

, where

and

represent

the

-dimensional embeddings of

and

at time

, respectively. The notation [:] denotes concatena-

tion, creating a

train

|×2d

matrix that represents

the training data at time

. In our initial experi-

ments, we found that data representations such as

: e

]

, where

is the relation embeddings, did

not signiﬁcantly affect the results. Moreover, rep-

resenting the data as

: e

]

led to a bias

towards relation representation, causing data points

with identical relation types to cluster together.

We obtained clusters

, C

, . . . , C

by running

HDBSCAN. Our algorithm then selects

|B|

events

from these clusters by prioritizing the data points

closest to the exemplars and giving precedence

to larger clusters. If

|B|

< m

, data points are

chosen only from the ﬁrst

|B|

clusters. Conversely,

|B|

> m

, the number of points selected from

each cluster will depend on the cluster size, with

a minimum of one data point chosen from each

cluster. The speciﬁcs of this procedure are detailed

further in Algorithm 1.

评论收藏

内容反馈

版权申诉

QuietNightThought

粉丝: 2w+
资源: 635

01- Overcoming Catastrophic Forgetting For Event-Centric.rar

专生本本科英语资料四.pdf

软件测试相关论文-英文原文82篇

Blog-Overcoming Objections and Concerns of ideal customer

Ford-Trends-Book-2016-Interactive.pdf

Mind Tools_Practical Thinking Skills for an Excellent Life_2007.pdf

KEYSIGHT-Overcoming 5G NR Device Design-综合文档

Swift.3.Protocol-Oriented.Programming.2nd.Edition.epub

Web Analytics 2.0: The Art of Online Accountability and Science of Customer Centricity

The N-way Toolbox for MATLAB ver. 3.10

RSpec.Essentials

20180411_迁移学习1

Voice over IP(VOIP) in networked virtual environments

计算机组成与结构体系英文课件：Chapter3 BasicInputOutput.pdf

book3词汇复习检测题(unit1---unit4).pdf

Docker-in-Practice.pdf

.Overcoming_the_Challenges_of_AMP_Multicore_Programming

Concave Collider 1.23 - 网格外形碰撞器强化版.rar

The Art and Science of CCD Astronomy / edited by David Ratledge.

Fedora_Draft_Documentation-0.1-RPM_Guide-en-US.pdf

音视频-直播技术-靠近由体验到画面.pdf

Overcoming Single-Thread Performance Hurdles

英语四级万能写作模板.docx

最新人教版初三unit14讲解-学生.doc

锂离子电池在新能源汽车中的应用与发展探讨.pdf

最新资源