2021), utilize a lower-dimensional space, such as a
vector (Leblay and Chekol, 2018; Jain et al., 2020),
or a hyperplane (Dasgupta et al., 2018; Wang and
Li, 2019), for event timestamps and define a func-
tion to map an initial embedding to a time-aware
embedding.
On the other hand, evolving models assume a dy-
namic representation for entities or relations that is
updated over time. These dynamics can be captured
by shallow encoders (Xu et al., 2019; Mirtaheri
et al., 2019; Han et al., 2020a) or sequential neural
networks (Trivedi et al., 2017; Jin et al., 2020; Wu
et al., 2020; Zhu et al., 2020; Han et al., 2020b,c; Li
et al., 2021). For example,(Xu et al., 2019) model
entities and relations as time series, decomposing
them into three components using adaptive time se-
ries decomposition. DyERNIE (Han et al., 2020a)
propose a non-Euclidean embedding approach in
the hyperbolic space. (Trivedi et al., 2017) repre-
sent events as point processes, while (Jin et al.,
2020) utilizes a recurrent architecture to aggregate
the entity neighborhood from past timestamps.
2.2 Continual Learning
Continual learning (CL) or lifelong learning is a
learning setting where a set of tasks are learned
in a sequence. The major challenge in CL is over-
coming catastrophic forgetting, where the model’s
performance on past learned tasks is degraded as
it is updated to learn new tasks in the sequence.
Experience replay (Li and Hoiem, 2018) is a major
approach to mitigate forgetting, where representa-
tive samples of past tasks are replayed when up-
dating a model to retain past learned knowledge.
To maintain a memory buffer storage with a fixed
size, representative samples must be selected and
discarded. (Schaul et al., 2016) propose selecting
samples that led to the maximum effect on the loss
function when learning past tasks.
To relax the need for a memory buffer, genera-
tive models can be used to learn generating pseudo-
samples. (Shin et al., 2017) use adversarial learning
for this purpose. An alternative approach is to use
data generation using autoencoders(Rostami et al.,
2020; Rostami and Galstyan, 2023a). Weight con-
solidation is another important approach to mitigate
catastrophic forgetting (Zenke et al., 2017; Kirk-
patrick et al., 2017). The idea is to identify impor-
tant weights that play an important role in encoding
the learned knowledge about past tasks and con-
solidate them when the model is updated to learn
new tasks. As a result, new tasks are learned using
primarily the free learnable weights. In our frame-
work, we combine both approaches to achieve opti-
mal performance.
2.3 Continual Learning for Graphs
CL in the context of graph structures remains an
under-explored area, with a limited number of re-
cent studies addressing the challenge of dynamic
heterogeneous networks (Tang and Matteson, 2021;
Wang et al., 2020; Zhou and Cao, 2021) and se-
mantic knowledge graphs (Song and Park, 2018;
Daruna et al., 2021; Wu et al., 2021). In partic-
ular, (Song and Park, 2018; Daruna et al., 2021)
propose methods that integrate class incremental
learning models with static translation-based ap-
proaches, such as TransE (Bordes et al., 2013), for
addressing the problem of continual KG embed-
dings. Additionally, TIE (Wu et al., 2021) develops
a framework that predominantly focuses on seman-
tic KGs, and generates yearly graph snapshots by
converting a fact with a time interval into multiple
timestamped facts. This process can cause a loss
of more detailed temporal information, such as the
month and date, and results in a substantial overlap
of over 95% between consecutive snapshots. TIE’s
frequency-based experience replay mechanism op-
erates by sampling a fixed set of data points from a
fixed-length window of past graph snapshots; for
instance, at a given time
t
, it has access to the snap-
shots from
t−1
to
t−5
. This contrasts with the stan-
dard continual learning practice, which involves
sampling data points from the current dataset and
storing them in a continuously updated, fixed-size
memory buffer. When compared to Elastic Weight
Consolidation (EWC), the L2 regularizer used by
TIE proves to be more rigid when learning new
tasks over time. Furthermore, their method’s evalu-
ation is confined to shallow KG completion models
like Diachronic Embeddings (Goel et al., 2020) and
HyTE (Dasgupta et al., 2018).
3 Problem Definition
This section presents the formal definition of con-
tinual temporal knowledge graph completion.
3.1 Temporal Knowledge Graph Reasoning
A TKG is a collection of events represented as a set
of quadruples
G = {(s, r, o, τ )|s, o ∈ E, r ∈ R}
,
where
E
and
R
are the set of entities and relations,
and
τ
is the timestamp of the event occurrence.