We think of a diffusion on a network as a process where neighboring nodes switch states from in-
active to active. The network over which activations propagate is usually unknown and unobserved.
Commonly, we only observe the times when particular nodes get “infected” but we do not observe
who infected them. In case of information propagation, as bloggers discover new information, they
write about it without explicitly citing the source [15]. Thus, we only observe the time when a blog
gets “infected” but not where it got infected from. Similarly, in disease spreading, we observe peo-
ple getting sick without usually knowing who infected them [26]. And, in a viral marketing setting,
we observe people purchasing products or adopting particular behaviors without explicitly knowing
who was the influencer that caused the adoption or the purchase [11]. Thus, the question is, if we as-
sume that the network is static over time, is it possible to reconstruct the unobserved social network
over which diffusions took place? What is the structure of such a network?
We develop convex programming based approach for inferring the latent social networks from dif-
fusion data. We first formulate a generative probabilistic model of how, on a fixed hypothetical
network, contagions spread through the network. We then write down the likelihood of observed
diffusion data under a given network and diffusion model parameters. Through a series of steps we
show how to obtain a convex program with a l
1
-like penalty term that encourages sparsity. We evalu-
ate our approach on synthetic as well as real-world email and viral marketing datasets. Experiments
reveal that we can near-perfectly recover the underlying network structure as well as the parameters
of the propagation model. Moreover, our approach scales well since we can infer optimal networks
of a thousand nodes in a matter of minutes.
Further related work. There are several different lines of work connected to our research. First is
the network structure learning for estimating the dependency structure of directed graphical mod-
els [7] and probabilistic relational models [7]. However, these formulations are often intractable
and one has to reside to heuristic solutions. Recently, graphical Lasso methods [25, 21, 6, 19] for
static sparse graph estimation and extensions to time evolving graphical models [1, 8, 22] have been
proposed with lots of success. Our work here is similar in a sense that we “regress” the infection
times of a target node on infection times of other nodes. Additionally, our work is also related to a
link prediction problem [12, 23, 18, 24] but different in a sense that this line of work assumes that
part of the network is already visible to us.
The work most closely related to ours, however, is [10], which also infers networks through cascade
data. The algorithm proposed (called NetInf) assumes that the weights of the edges in latent network
are homogeneous, i.e. all connected nodes in the network infect/influence their neighbors with the
same probability. When this assumption holds, the algorithm is very accurate and is computationally
feasible, but here we remove this assumption in order to address a more general problem. Further-
more, where [10] is an approximation algorithm, our approach guarantees optimality while easily
handling networks with thousands of nodes.
2 Problem Formulation and the Proposed Method
We now define the problem of inferring a latent social networks based on network diffusion data,
where we only observe identities of infected nodes. Thus, for each node we know the interval
during which the node was infected, whereas the source of each node’s infection is unknown. We
assume only that an infected node was previously infected by some other previously infected node
to which it is connected in the latent social network (which we are trying to infer). Our method-
ology can handle a wide class of information diffusion and epidemic models, like the independent
contagion model, the Susceptible–Infected(SI), Susceptible–Infected–Susceptible(SIS) or even the
Susceptible–Infected–Recovered (SIR) model [2]. We show that calculating the maximum likeli-
hood estimator (MLE) of the latent network (under any of the above diffusion models) is equivalent
to a convex problem that can be efficiently solved.
Problem formulation: The cascade model. We start by first introducing the model of the diffusion
process. As the contagion spreads through the network, it leaves a trace that we call a cascade.
Assume a population of N nodes, and let A be the N × N weighted adjacency matrix of the network
that is unobserved and that we aim to infer. Each entry (i, j) of A models the conditional probability
of infection transmission:
A
ij
= P (node i infects node j | node i is infected).
2
评论0
最新资源