Download pdf - Spreading Process in Multilayer Networks

7/24/2019 Spreading Process in Multilayer Networks

1/48

Spreading Process inMultilayer Networks

Luca Casini

[email protected]

Corso diLaurea Magistrale in Informatica

A.A. 2015/2016
mailto:[email protected]:[email protected]


2/48

Introduction


3/48

Introduction

Networks are used to model many real-world systems.

i.e. Transportation, Computer, (Online) social network

spreading process may involve humans, information ofvarious nature, viral agents.

Biologist first studied diffusion of pathogens and thennetwork science took over their work.

These models gained the attention in recent year for theirapplication to information over communication systemsand social networks.

Early works focused on simple network but multilayernetworks are becoming more and more important.


4/48

Preliminaries and Definitions


5/48

Multilayer Network

Multilayer networks are composed of monoplex networks

which are modeled as traditional graphs.We use the following notation:

V The set of nodes in a multilayer network

L The set of layers in a multilayer network

n the number of nodes

(u,lu) Node uon layerl

u

((u,lu),(v,l

v)) Edge between node uon layer l

uand node von

layer lv


6/48

Multilayer Network

AGGIUNGERE TESTO


7/48

Types of Multilayer Network

We can define two types of multilayer network:

Multiplex Network: All layers (almost) contain thesame nodes.

i.e. same group of people in multiple social networks

Interdependent Network:Nodes belong to just onelayer. This kind of network may be seen asinterconnected communities within a large monoplexnetwork.

i.e. power and communication infrastructure networks


8/48

Cascade and Diffusion

We call Cascadethe trace of information diffusion starting

from a node called Seed.A cascade generates an implicit network, called DiffusionNetwork.

The multilayer network in which the cascade takes place isreferred as Underlying Network

We use the following notation:

C An Information Cascade

(u,lu,v,l

v,t)

CThe entries of the set denoted by the cascade C

D A multilayer diffusion network


9/48


cascade c1 with seed (v4,l2)

cascade c2

with seed (v4,l

1)

The diffusion networkresulting from theaggregation on c

1and

c

2


10/48


There are four possibilities of spreading:

same-node inter-layer: the cascade switches layer butnot node.i.e. User sharing content on different social networks

other-node inter-layer: the cascade goes from one node

to another on a different layer.i.e. Sharing a youtube video on Facebook

other-node intra-layer: the cascade moves inside onelayer.

i.e. simple spreading inside social network like retwitting

same-node intra-layer:Trivial, generally omitted instudies


11/48

Variables

Since its difficult to obtain real datasets, multilayer

networks research is mostly composed of eithersimulation-based studiesor analytical studies basedon mathematical models.

Both are based on the observation of some interesting

variablesand input parameters.Many different metricscan be found in the literature, wetake a look at some of the most important.


12/48

Input Parameters

Transmissibilityrepresents the probability of

transmitting an item from one node to another.If nodes have different types we can distinguish betweenhomogenousand heterogenoustransmissibility.

The type of underlying network(i.e. random, small

world, scale-free )

The relationship between different layers (i.e. thecorrelation between node degrees)

Variations of these two parameters can producesubstantial differences in the outcome of the spreadingprocess.


13/48

Static Variables

Epidemic Thresholdis one of the fundamental

variable in the epidemic-like models. It indicates avalue of transmissibility above which the diffusioninvolves most of the network.

Survival Threshold indicates if a diffusion process will

survive. Absolute-Dominance Threshold indicates if adiffusion process can completely remove thecompetitor.

Infection Size(also called outbreak or cascade size) is

the number of nodes in the diffusion network. Infection Rate: the average rate of being in contact

over a link.


14/48

Temporal Variables

The variables we introduced until now are all static.

Taking time into account we have: Epidemic Dynamics is the fraction of infected nodes

at a given time in those models that call for recovery ordeath after some time.

Cascade Velocity measure how fast a cascade reachesa certain size or some relevantnodes.

Survival Probability indicates the chances of aninfection started from a single node of still being active

at a time t.


15/48

Target-Based Variables

Sometimes there is a subset of nodes that we consider

relevant based on some particular features.i.e. popular people on social networks

In this context we can use the measures of RecallandPrecision that can be commonly found in the field of

information retrieval. Recallis defined as the ratio of relevant nodes in the

diffusion network over the total number of nodes .

Precisionis defined as the ratio of relevant nodes in

the diffusion network over the total nodes in thediffusion network.


16/48

Models


17/48

Models

We will now review the most important models used to

study spreading process on multilayer networksFirst we will categorize models in two groups:

Epidemic-Like

Decision-Based

The we will present some of the mathematical approachesused in the analysis of those models:

Generating Functions

Markov Chain Approximation

Mean-Field Theory

Game Theory


18/48

Epidemic Models

Epidemic-Like Models are generally applied to either

diseases of influence diffusion.Most of the works on multilayer networks are based onthe SIR, SISor SI

1I

2Rmodels.

Those are stateful models in which a node can either be

susceptible, infected or recovered (in the SIR models).

Infectednodes diffuse the disease to their susceptibleneighbors with infection rate .

Infected nodes can recover (or return susceptible) after atime .

Transmissibility is defined as T = 1- e-


19/48

Epidemic Models

Many variation on the SIR model can be found in the

literature. Some add new states considering the event ofbirth and death, or the effects of isolation on thespreading process.

An important one is the Independent Cascade Modelwhich is a discrete-time version of the SIR model.An infected node uat time tcan infected its neighbor v,if it succeeds vbecomes infected at time t+1

This model is often used in influence spreading studies.


20/48

Decision-Based Models

Decision-Based Models, also called Threshold Modelsin

physics literature, are based on the idea that each agentdecides whether or not to adopt a behaviour dependingon its neighbors.

i.e. People may start smoking if their social network is

comprised of many smokers.There are two main approaches in decision-basedstudies:

Informational Effects Approach

Direct-Benefit Effects Approach


21/48

Decision-Based Models

In the Informational Effects Approach decisions are

made based on indirect information about others choices. Linear Threshold Model: if a fraction of neighbors >

TLTM

has adopted a new behaviour then a node will takeup the same behaviour.

The Direct-Benefit Effects Approachis a game theoreticperspective of the problem where an agent takes up abehaviour if its convenient.

Ramezanian proposed a model where each node is playing

a game with its neighbors. At each round nodes updatetheir strategy (adoption on behaviour A or B) based on apayoff matrix.


22/48

Theoretical Approaches

Widely used in the analysis of stochastic processes,

Generating Functions can uniquely determine a discretesequence of numbers, and can be useful for computing:

probability density functions moments

limit distributions solutions of linked differential-difference equations

Generating functions have also been used to studybranchingand percolationprocessesas two important

stochastic processes for modeling spread of epidemicsover networks.


23/48

Branching Process

The branchingprocessmodel is a simple framework for

modeling epidemics on a network.While infected, an agent may spread the disease withprobability p to kotheragents (first wave). Each of thosecan then infect kother agents, spreading the disease to k2

individuals (second wave), and so on.Studying how many waves can a process survive is ofmajor interest.

When state is important (e.g. SIR model) branching

process cannot be used and bond percolation is preferred.


24/48

Percolation

Percolation theory studies the structure of connected

clusters in random graphs.pcis the critical probability such that for p > p

cthe random

graph has a giant connected component. A percolationtransition occurs at the critical occupation probability p

c,

which is the point of appearance/disappearance of a GCC.In [102] the authors extends percolation theory tomultiplex networks by introducing the concept of weakbootstrap percolation and weak pruning percolation.

These two models are distinct and give origin to differentcritical behaviors on the emergence of critical transitions,unlike their equivalence in the case of single layer.


25/48

Markov-Chain Approximation

The Microscopic Markov-Chain Approximation (MMA) is an

established approach to study the microscopic behavior ofepidemic dynamics.

e.g., the probability that a given node will be infected.

This approach can further be categorized as:

Discrete-time version Continuous-time version

Discrete-time version has been used to study malwarediffusion with the SIS model showing equivalence between

multilayer and single layer dynamics when the state of anode is the same in all layers.


26/48

Mean-Field Theory

Large Markovian models may become intractable.

In Mean-field theory, a small averaged effect and anexternal field are considered instead of computing allinteractions between agents.

This allows the description of the model with a number ofnonlinear differential equations with linearly, instead ofexponentially, growing state space.

This method has been used to review and generalize

epidemic-like models.


27/48

Game Theory

Game-theoretical approaches take into account the effect

of cooperation and competition between agents.Studies in social networks showed that communityemergence and information spreading can be explained interms of payoff maximizationand are influenced by

features of each agent: Reputation

Desire of popularity

Knowledge

Information belief


28/48

Spreading Dynamics on

Multilayer Networks


29/48

Interconnected Networks

Diffusion processes in interconnected networksare

affected by spectral properties of the combinatorial supra-Laplacian of underlying graph which is linked to layercoupling.

In particular changing the second eigenvalue shows two

very distinct regimes with layers either decoupled orindistinguishable.

Spreading in interconnected networks has been studied interms of:

Interaction strength between layers

Inter-layer pattern.


30/48

Interaction Strength

second-nearest neighbors: expected number of

neighbors = k2

/k.kis the moment of the degreedistribution. In weakly coupled networks (

A

T

B) we

find a phase in which a layer may be in epidemic stateindependently of others, depending on transmissibilityand average inter-layer degree.

interconnection topology measure: quantitativemeasure of coupling given by the formula:

inter-layer link density: The ratio of existing interlayerconnection to the total possible d = m/(n

Ax n

B)


31/48

Inter-Layer Patterns

Some studies highlighted the importance of inter-layer

links and the patterns they create.Simulation studies showed that the degree of connectionshave less impact than the density.

A new definition of Epidemic Threshold in the SIS model

was proposed, considering degree of connected nodes:

TE= 1/

1(M + N) =infection rate, M = adjacency matrix, N

inter-layer matrix

Another study observed that if correlation between intraand inter-layer degree is very strong an outbreak mayappear even below the epidemic threshold of each layer.

l


32/48

Intra-layer structure

Epidemics dynamics depend not only on interlayer links

but also in intralayer.A study on cliques (groups of people who are close)showed how they influence epidemic threshold and

infection size and speed. They defined 3 types of link: 1

intra-clique, 2 inter-clique, and 3 online.let d

wand d

fbe the number of type 1 and 2 links per node,

there is an epidemic state when:

with E representing the moment of degree distribution

i l i il i


33/48

inter-layer similarity

Similarity (or the lack of it) may influence the spreading

behaviour.Degree-Degree Correlation is described by factors where k is the number of intra-layer nodes in

each layer. Similarly interlayer correlation can be

measured.Average Similarity of Neighbors is defined as:

whereKA

represents number of neighbors on layer A and KC

representthe number of common neighbor.

Strong degree correlation lead to low epidemic thresholdand small infection size. Interestingly its not influenced byaverage neighbors similarity.

l i hi


34/48

layer-switching cost

Some models must consider that diffusion on different

layer involves some kind of cost or overheadi.e. changing mean of transport or sharing content from a socialnetwork to another

A recent study considers this and observes the behaviour

of epidemic threshold in function of node degree andinfection rate.

A large difference in infection rates among layers meanhigher overhead and higher epidemic threshold.

If a layer is denser epidemic threshold lowers as spreadingbecomes easier.

Diff i V l i


35/48

Diffusion Velocity

The presence of multiple layers impacts the speed of the

diffusion process.Intuitively multiple layers speed up the diffusion processbecause there more links to spread the information. Somestudies confirmed this showing correlation between

coupling and velocity.However some empirical studies pointed out thatinefficient topologies in monoplex networks andobstructed inter-layer links in multiplex networks lead to

decreasing speed.

P ti l O l


36/48

Partial Overlap

In partially overlapped multilayer networksonly a

fraction of nodes is present in all layers.A study on the effect of overlapping in the SIR modeldiscovered that the epidemic threshold T

cis directly linked

to the fraction qon node present in both layers

aaaa A = branching factor

aaaaaaaa of layer A

this means that the epidemic threshold of the layer withlower diffusion capability affects the threshold of theother.

I t ti S di P


37/48

Interacting Spreading Process

In the real world different spreading processes coexist and

interact with each other. Epidemic and games-theoreticmodels have been presented to address this.

game-theoreticstudies are mainly focused on competingrumors or companies trying to sell their product.

Epidemic models and consider two competing viruses ormemes. This two viruses can coexist or one can dominatethe other, eventually leading to extinction.

An interesting application of the model studies a virus

versus an immunization process.

All studies concluded that interaction dynamics are linkedto the underlying network topology.

I ti Diff i


38/48

Innovation Diffusion

diffusion of innovation (new behaviours, technology,

ideas, products) received considerable interest in socialscience and economics.

The problem has been studied in using and extension of

the Watts threshold model.

A content-dependent threshold was introduced, dealingwith a specific bias each link has towards certain content.

Considering the approach of direct-benefit effects n agame-theoretic framework, a lower bound for the success

of an innovation can be found

i.e. how many people in the network adopt a specific

strategy

R C t i t


39/48

Resource Constraints

In real life nodes of a multiplex network share limited

resources. This will impact spreading processes dynamics.i.e. people share their time between different online socialnetworks.

A variation of the SIR model called constrained SIR isintroduced. In each step, a limited number of neighborscan be infected.

The authors find that, in agreement with previous studies,in the absence of constraints, positively correlation leads

to a lower epidemic threshold than a negative correlation.However, in the presence of constraints, spreading is lessefficient in positively correlated coupling than negativelycorrelated networks.


40/48

Applications

Applications


41/48

Applications

Spreading processes in multilayer networks have a large

number of applications: understanding the dynamics of cascades. maximizing the influence in the context of viral

marketing .

placing sensors to detect the spreading as quickly aspossible in a network.

The application areas can be roughly categorized into twoclasses:

Forward Prediction:applications that need to steer thenetwork into a particular desired state.

Backward Prediction: applications that require to predicthow a given piece of information will diffuse in a network.

Influence Maximization


42/48

Influence Maximization

influence maximization has the goal of spreading

information as quickly as possible.This can be achieved by choosing the most influential

nodes as a seed. these nodes are chosen according to

some measure of centrality like: pagerank, betweeness

or eigenvector centrality.On the other hand we can choose the messages that are

likely to survive longer than the others and so propagateto more nodes, obtaining the same effect.

Immunization


43/48

Immunization

resilience to a disease can be achieved through

information dissemination. Various work investigated thisquestion using a model based on two layers:

the infection layer: where the disease spreadsthe prevention layer: where awareness spreads

Studies have observed that awareness can raise the

infection threshold and in many cases almost stop theinfection.

An important application is studying the effectiveness ofvaccination campaigns.

Delay Tolerant Networks


44/48

Delay-Tolerant Networks

Delay-Tolerant Networks are networks which address the

problem of the lack of continuous connectivity.i.e. deep space communication, sensor networks

Routing on DTNs is more challenging than on traditionalnetworks and is an important application of forward

prediction, usually addressed using epidemic algorithmover the active connection graph.

As every sensor may have more than one communicationdevice the spreading process can be mapped with a

multilayer network where the best routing is bound bylatency and energy constraints.

Malware Propagation


45/48

Malware Propagation

Studying malware propagation and design solution to

contain outbreak is very important and involves bothforward and backward prediction.

This problem is intrinsically multilayer; along computer wehave mobile devices that are connected through multiplewireless interfaces (3g, bluetooth, wi-fi) and the use ofapplication allows communication with device that maynot be immediate neighbors.

Each of these factor should be taken into account as aseparate layer when modelling such spreading process

Conclusions


46/48

Conclusions

Information diffusion in multilayer networks is an active

and not yet consolidated research field, and thereforeoffers many unsolved problems to address. In some cases,phenomena that are quite well understood in monoplex

networks are comparatively not well understood in thecontext of multilayer networks; in other

cases, completely novel ideas, algorithms and analysis,specific to multilayer networks have to be developed.

Some research directions are illustrated below.

Open Problems


47/48

Open Problems

empirical study of information diffusion:Real dataset are both

difficult to obtain and study due to their size.

metrics and measurements: New metrics specific to multilayernetwork should be researched aside from those derived from

monoplex networks.

new models: Some phenomena may require new models to be

described. An example is the Data-Mining approach for heterogeneous

networks.data visualization: Visualization is a great tool for researchers, the

muxViz project is the best contribution at the moment.

time-varying networks: Time is central to many process. Studies on

time-varying multiplex networks are yet to appear.

evolution of underlying structure: There are studies on adaptivemonoplex networks but the multilayer perspective must be deepened.

outbreak detection: Detecting as quickly as possible a spreading

processes is a field worth exploring in multilayer networks.

References


48/48

References

[1]

[2]