このページは http://www.slideshare.net/fuzzysphere/physics-inspired-approaches-to-community-detection の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

約4年前 (2012/09/07)にアップロードin学び

Community structure is one of the most relevant features of graphs in sociology, biology, compute...

Community structure is one of the most relevant features of graphs in sociology, biology, computer science and so on. In this slide, the following methods for community detection are reviewed: (1) synchronization, and (2) spinglass.

References

[1] A. Arenas, A. D. Guilera, C. J. P. Vicente, Phys. Rev. Lett. 96, 114102 (2006) [arXiv:cond-mat/0511730]

[2] P. Ronhovde, Z. Nussinov, Phys. Rev. E 81, 046114 (2010) [arXiv:0803.2548]

[3] S.Fortunato, Phys. Rep. 486, 74 (2010) [arXiv:0906.0612]

- Physics Inspired Approaches

to Community Detection

2012-09

1 / 52 - Abstract

Community structure is one of the most relevant

features of graphs in sociology, biology, computer

science and so on.

In this talk, we review the following methods for

community detection:

synchronization

spinglass

2 - 1. Introduction

4 - Communities in real-world networks

biological network

protein interaction, gene regulatory, metabolic, food chain

social network

SNS, collaborators, phone/email, organization

technical system

web graph, Internet, power grid

other network

citation, e-commerce/bidding, stock returns

dynamical phenomena

epidemic, cascade, synchronization, opinion change

5 - Notation

G = (V, E) graph(network)

vi ∈ V vertex(node), n = |V|

∑

(i, j) ∈ E edge(link) from vi to vj, 2m =

ij Aij

Aij adjacency matrix 1 (i,j)∈E

Aij =

0 otherwise

ki degree of vi

∑

∑

kout =

=

i

j Aij, kin

j

i Aij

wij ≥ 0 weight of (i, j)

∑

∑

∑

wout =

=

i

j wij, win

j

i wij, 2w =

ij wij

cs ∈ C community, q = |C|

6 - What is community detection

Communities are subgraphs

within which connections are dense, and

between which they are sparse.

The concept of community

is not rigorously defined, and

includes some degree of arbitrariness.

Finding an exact solution is NP-hard in most cases.

→ Thus an approximation algorithm is needed.

7 - In this talk, we focus on

non-overlapping communities

non-dynamical graphs

sparse graphs: O(m) = O(n)

8 - Girvan-Newman algorithm [1]

hierarchical divisive algorithm

iteratively remove an edge with the highest

betweeness, and recalculate betweeness

O(m2n)

(for shortest-path betweeness version)

Alternative definitions of betweeness

1

shortest-path betweeness

2

flow betweeness

3

random-walk betweeness

9 - Modularity

Many algorithms assume modularity Q as a measure

of goodness of a partition. [2]

∑

winwout

i

j

Q = 1

2w

wij − 2w δ(ci, cj)

ij

q

∑ (

)

win

=

wss − s wout

s

w

4w2

s=1

∑

∑

2wss =

ij wijδ(ci, cs)δ(cj, cs), ws =

i wijδ(ci, cs)

ci: community to which vi belongs

1st term: weight of within-community edges

2nd term: expectation value of it for randomized graph

11 - Greedy modularity optimization

hierarchical agglomerative algorithm

iteratively merge communities to produce the

largest possible increase of modularity

O((m + n)n) [2, Newman]

O(md log n), d = (depth of dendrogram) ∼ log n

with use of max-heap [4, Caluset-Newman-Moore]

Improvement of merging strategy [20].

12 - Louvain method [25]

1

assign its own community to each node

2

iterate until no change happen

1

locally optimize each order in sequential

order until no change happen

2

replace communities by supernodes

Alternatively start with randomly assigned q < N

communities, and try several initial conditions.

A new random sequential order can be used each

time.

13 - Ref. [25]

14 - ✷✳

✂

✵✵ ✶

✵ ✶

✶✶

✵

✵

✶✶

✶✶

✵✵✵✵

✁ ✳

✺

✂

✵ ✵

✶

✵

✶

✶

✵✵

✶✶✶

✵ ✵✵

✶

✵ ✵

✵ ✵ ✶

✶

✵ ✶

✵

✵

✶

✶✵

✶✶✶

✶ ✶

✵ ✵✵✵

✵✵

✶

✁ ✳

✂

✵ ✵✵ ✶

✵ ✵ ✶

✵ ✶

✵

✶✵✵

✶✵ ✶

✶✶

✵✵

✵

✶

✶

✵

✶ ✶

✵✵✵✵

✵ ✵✵✵

✵ ✵✵ ✶

✵ ✶

✶ ✶

✵✵

✶ ✶

✵ ✶

✶✶

✵

✶

✵ ✵ ✶

✳

✺

✂

✵

✶

✵

✵ ✶ ✶

✵

✶✶

✶ ✶

✶ ✶✶

✵

✶

✵

✶

✵

✶✵

✶✵ ✶

✵

✶✵ ✶✶

✵

✶✶

✵✵ ✶

✵

✶

✵

✶✶✶

✶✵✵

✶✶

✵

✶ ✶

✵

✶

✵

✶ ✶✶

✵

✵ ✵

▲

✭▼

✮

❂

✳

✁✸

✰

✷✳

✾✼

❂

✸ ✳

✾

❜

✐ t

s

Ref. [22]

17 - Minimize the description length by Louvain method:

∑

∑ ∑

L =

qexit

s

H +

p

H

i + qexit

s

s.

s

s

i

ci=cs

H is the entropy of a community-index codebook

∑

(

)

qexit

qs

H = −

s

∑

log ∑

.

s

r qexit

r

r qexit

r

Hs is the entropy of a within-community codebook

(

)

qs

qs

Hs = −

∑

log

∑

qexit +

+

r

i

pi

qexit

pi

c

r

i,ci=cs

i=cs

∑

p

p

−

i

∑

i

log

∑

qexit +

p

qexit +

p .

i

i

i

j

i

j

c

i

i=cs

ci=cs

18 - Comparative Evaluation

Synthetic graph for benchmark

degree distribution

community size distribution

density radio of within/inter-community edges

hierarchical community structure

Similarity between clustering X and Y is measured

by normalized mutual information.

∑

2

Nrs

r,s

log NrsN

I

N

N

(X, Y) =

r∗N∗s

∑

∑

Nr∗

+

N∗s

r

log Nr∗

log N∗s

N

N

s N

N

Nrs: #(node) assigned to a community r by X, and s by Y

∑

∑

N∗s =

r Nrs, Nr∗ =

s Nrs

19 - 2. Synchronization

21 - Kuramoto Model

∑

˙

θi = ωi + K

sin(θ

n

j − θi)

j

ωi is distributed according to g(ω) with zero mean.

Assume g(ω) is unimodal and g(ω) = g(−ω).

A mean-field order parameter defined by

∑

reiϕ = 1n j eıθj

yields

˙

θi = ωi + Kr sin(ϕ − θi).

22 - The solutions exhibit two types of long-term

behavior.

|ω

i| ≤ Kr locked

|ωi| > Kr drift

Demanding the drifting oscillators form a stationary

distribution leads to a self-consistent equation:

∫ π/2

r = Kr

cos2 θg(Kr sin θ).

−π/2

A non-trivial solution (r > 0) is admitted beyond

Kc =

2

.

πg(0)

23 - Synchronization on network [28, review]

∑

˙

θi = ωi + K

Aij sin(θj − θi)

j

alternative choices of the coupling constant:

Aij Aij Aij

Aij →

,

,

.

N

ki ⟨k⟩

For simplicity we focus on undirected & unweighted

graphs, and ki ≫ 1.

25 - A local order parameter defined by

∑

rieıϕi =

j Aij⟨eıθj⟩t

yields

˙

θi = ωi − Kri sin(θi − ϕi) − Khi(t)

{

∑

(

)}

hu(t) = ℑ e−ıθi

⟨

.

j aij

eıθj⟩t − eıθj

hi is ignorable. Look for stationary solutions.

∑

∑

ri =

Aijeı(θj−ϕj) +

Aij⟨eı(θj−ϕj)⟩t.

|ωj|≤σrj

|ωj|>σrj

26 - Suppose (ri, ϕi) is statistically independent of ωi.

Ignore the 2nd term to get

√

∑

(

)

ω 2

j

ri =

Aij cos(ϕj − ϕi)

1 −

.

Kr

|ω

j

j|≤σrj

A critical coupling is obtained when cos(ϕj − ϕi) = 1.

Continuum approximation leads to

∑

∫ 1

√

ri = K

Aijrj

dxg(Krjx) 1 − x2

j

−1

→

1

Kc =

2

.

πg(0) (largest eigenvalue of A)

27 - Dynamical clustering [9]

Suppose oscillators are identical ∀i, ωi = ω.

→ Then full synchronization ∀i, θi = θ is possible.

A dense (sparse) subgraph synchronize

rapidly (slowly).

start from random initial conditions I

calculate a local order parameter

ρij(t) = ⟨cos(θi(t) − θj(t))⟩I

and dynamical connectivity matrix

D

1 ρij(t) > T (T: threshold)

t(T)ij =

.

0 otherwise

28 - Consider the linearized model

∑

˙

θi = −K

Lijθj,

Lij = kiδij − Aij

j

whose solution in terms of normal modes is

∑

φi(t) =

Bijθi(t) = φi(0)e−λit.

j

Bij: matrix of eigenvectors of Lij

Suppose the adjacency matrix is symmetric.

Eigenvalues of the Laplacian matrix Lij are

0 = λ1 ≤ λ2 · · · ≤ λn.

30 - 100

100

13-4

15-2

i

i

10

10

100

100

time

time

100

100

13-4

15-2

i

i

10

10

0.1

1

0.1

1

1/λ

1/λ

i

i

Ref. [9]

Plateaus indicate stable community structures.

31 - Other dynamical clustering algorithms

opinion changing rate

∑

˙xi = ωi +

K

∑

bα(t)A

ij

ijβ sin(xj − xi)e−β|xj−xi|

j bα(t)

ij

j

R ¨ossler oscillator

∑

˙xi = F(xi) −

K

∑

bα(t)L

ij

ijH(xi − xj)

j bα(t)

ij

j

F = (−y − z, x + ay, b + (x − c)), H = (x, 0, 0)

α(t) = 0 or α(0) = 0, ˙α ≤ 0

fully synchronized state + small disorder

→ split into communities (as time goes)

32 - 3. Spinglass

33 - Community detection by spinglass

spin σi = community a node belongs to, ci

ground state = optimal partition

Start with an initial random condition.

Minimize the energy of spinglass Potts model

∑ (

)

H({σ}) = −

aijAij − bij(1 − Aij) δ(σi, σj)

ij

by using

simulated annealing, or

Louvain-like method [26].

34 - Reichardt-Bornholdt model [10]

∑ (

)

wiwj

HRB = −

Aijwij − γpij δ(σi, σj),

pij = 2w

ij

pij is an expectation value of wij for a “null model”

(config model in the above).

HRB,γ=1

Q = −

m

Use Erd ˝os-R ´eyni model as null model.

∑ (

)

HRB−ER = −

Aijwij − γpij δ(σi, σj),

pij = p⟨w⟩

ij

35 - Arenas-Fern ´andes-G ´omez model [21]

∑ (

)

HAFR = −

Aijwij + rδij − pij(r) δ(σi, σj)

ij

(wi + r)(wj + r)

pij(r) =

2w + nr

A self-loop with weight r is added to each node.

Ronhovde-Nussinov model [26]

∑ (

)

HRN = −

Aijwij − γ(1 − Aij) δ(σi, σj)

ij

No global parameter is included.

The weights of missing links are supposed to be γ.

36 - Minimizing Hamiltonian

H is costly to evaluate.

∆HRB(σi = σr → σs)

∑ (

w

)

∑ (

w

)

=

iwj

iwj

wij − γ

δ(r, σ

w

δ(s, σ

2w

j) −

ij − γ 2w

j)

j i

j i

∑

γw

=

i

wij(δ(r, σj) − δ(s, σj)) −

(w

2w

r − wi − ws)

j i

To calculate ∆H only the followings are necessary:

states of neighbors

some global bookkeeping (wr in the above)

38 - 1

initialize: start from a state wherein each node

forms its own community

2

optimization: iterate until convergence

1

iterate until convergence

1

sequentially pickup each node

2

calculate the energy change as if it were

moved to neighboring community

3

assign the node to the community with the

lowest energy

2

(node level) replace communities by supernode

(super node level) return to node-level

An another way is to start with randomly assigned q

communities, and try several initial conditions.

Optionally use a new random sequential order each

time.

39 - (a)

Hierarchy Level 2:

4

APM (t = 1 trial)

APM (t = 4)

RBCM (t = 1)

RBCM (t = 4)

3

V

2

1

RBCM

APM

0

-1

0

1

10

10

10

(b)

Hierarchy Level 3:

4

APM (t = 1 trial)

APM (t = 4)

RBCM (t = 1)

RBCM (t = 4)

3

V

2

1

RBCM

APM

0

-1

0

1

10

10

10

Ref. [26]

41 - Determine resolution by replicas [32]

Calculate S(γ) for each γ.

generate r replicas by reordering the nodes

optimize replicas independently

average I(A, B) over all pairs of replicas

∑

S(γ) =

2

I(A, B)

r(r − 1) (A,B)

Select γ of the strongest correlation, or on plateaus.

∑

Another way is to minimize F =

replica H(γ) − TS(γ) directly.

42 - 1.0

70

6

(a)

60

5

0.8

(iia)

50

4

(ia)

0.6

40

N

I

q

I

3

30

0.4

2

20

I

N

0.2

1

I

10

q

0.0

0

0

-1

0

1

10

10

10

5

50

V

4

(b)

H

4

40

q

3

3

30

V

q

(iib)

H

2

2

20

1

(ib)

1

10

0

0

0

-1

0

1

10

10

10

Ref. [32]

43 - Resolution limit

A model which includes a global parameter tends to

have resolution limit.

√

HRB has resolution limit

m/γ

HRN is “resolution-limit-free”

Definition

Let C = {Ci} be a H-optimal partition.

H is resolution-limit-free if for each subgraph

induced by D ⊂ C, D is also H-optimal. [38]

44 - 4. Summary

45 - Summary

We have reviewed two approaches to community

detection

synchronization

spinglass

and the problem of resolution limit.

46 - Which algorithm should we use?

Any single algorithm is not preferred in all cases. [37]

Use multi-resolutional algorithms for graphs with

heterogeneous/hierarchical community structure.

Other algorithms [35, review]

label propagation

Bayesian inference

spectral methods

Markov clustering (MCL)

clique percolation

47 - Related topics

overlapping communities

dynamical/adaptive graph

multigraph

parallel computation

48 - fin.

49 - References I

[1]

M. E. J. Newman, M. Girvan, “Finding and evaluating community structure in networks,” Phys. Rev. E 69, 026113

(2004) [cond-mat/0308127].

[2]

M. E. J. Newman, “Finding and evaluating community structure in networks,” Phys. Rev. E 69, 066133 (2004)

[cond-mat/0309508].

[3]

J. Reichardt, S. Bornholdt, “Detecting fuzzy community structures in complex networks with a Potts model,” Phys. Rev.

Lett. 93, 218701 (2004) [cond-mat/0402349].

[4]

A. Clauset, M. E. J. Newman, C. Moore, “Finding community structure in very large networks,” Phys. Rev. E 70, 066111

(2004) [cond-mat/0408187].

[5]

A. Pluchino, V. Latora, A. RapisardaChanging, “Opinions in a Changing World: a New Perspective in Sociophysics,” Int.

J. Mod. Phys. C 16 515 (2005) [cond-mat/0410217].

[6]

L. Danon, J. Duch, A. Diaz-Guilera, A. Arenas, “Comparing community structure identification,” J. Stat. Mech. P09008

(2005) [cond-mat/0505245].

[7]

S. H. Yook, H. M. Ortmanns, “Synchronization of R ¨ossler Oscillators on Scale-free Topologies,” Physica A 371, 781

(2006) [cond-mat/0507422].

[8]

S. W. Son, H. Jeong, J. D. Noh, “Random field Ising model and community structure in complex networks,” Eur. Phys. J.

B 50, 431 (2006) [cond-math/0502672].

[9]

A. Arenas, A. D. Guilera, C. J. P. Vicente, “Synchronization reveals topological scales in complex networks,” Phys. Rev.

Lett. 96, 114102 (2006) [cond-mat/0511730].

[10]

J. Reichardt, S. Bornholdt, “Statistical Mechanics of Community Detection,” Phys. Rev. E 74, 016110 (2006)

[cond-math/0603718].

[11]

S. Fortunato, M. Barthelemy, “Resolution limit in community detection,” Proc. Natl. Acad. Sci. 104, 36 (2007)

[physics/0607100].

[12]

S. Boccaletti, M. Ivanchenko, V. Latora, A. Pluchino, A. Rapisarda, “Detection of Complex Networks Modularity by

Dynamical Clustering,” Phys. Rev. E 75, 045102 (2007) [physics/0607179].

[13]

A. Pluchino, S. Boccaletti, V. Latora, A. Rapisarda, “Opinion dynamics and synchronization in a network of scientific

collaborations,” Physica A 372, 316 (2006) [physics/0607210].

50 - References II

[14]

J. G. ardenes, Y. Moreno, A. Arenas, “Paths to Synchronization on Complex Networks,” Phys. Rev. Lett. 98, 034101

(2007) [cond-mat/0608314].

[15]

A. Arenas, A. D. Guilera, C. J. P. Vicente, “Synchronization processes in complex networks,” Physica D 224, 27 (2006)

[nlin/0610057].

[16]

J. M. Kumpula, J. Saramaki, K. Kaski, J. Kertesz, “Limited resolution in complex network community detection with

Potts model approach,” Eur. Phys. J. B 56, 41 (2007) [cond-mat/0610370].

[17]

A. Arenas, A. D. Guilera, “Synchronization and modularity in complex networks,” Eur. Phys. J. ST 143, 19 (2007)

[cond-mat/0610726].

[18]

M. Mosvall, C. T. Bergstrom, “An information-theoretic framework for resolving community structure in complex

networks,” Proc. Natl. Acad. Sci. 104, 7327 (2007) [physics/0612035].

[19]

C. Zhou, J. Kurths, “Hierarchical synchronization in complex networks with heterogeneous degress,” Chaos 16, 015104

(2006).

[20]

K. Wakita, T. Tsurumi, “Finding Community Structure in Mega-scale Social Networks,” [cs/0702048].

[21]

A. Arenas, A. Fernandez, S. Gomez, “Analysis of the structure of complex networks at different resolution levels,” New

J. Phys. 10, 053039 (2008) [physics/0703218].

[22]

M. Rosvall, C. T. Bergstrom, “Maps of random walks on complex networks reveal community structure,” Proc. Natl.

Acad. Sci. 105, 1118 (2008) [0707.0609].

[23]

U. N. Raghavan, R. Albert, S. Kumara, “Near linear time algorithm to detect community structures in large-scale

networks,” Phys. Rev. E 76, 036106 (2007) [0709.2938].

[24]

A. Pluchino, V. Latora, A. Rapisarda, S. Boccaletti, “Modules identification by a Dynamical Clustering algorithm based

on chaotic R ¨ossler oscillators,” API Conf. Proc. 965, 323 (2007) [0711.1778].

[25]

V. D. Blondel, J. L. Guillaume, R. Lambiotte, E. Lefebvre, “Fast unfolding of communities in large networks,” J. Stat.

Mech. P10008 (2008) [0803.0476].

[26]

P. Ronhovde, Z. Nussinov, “Local resolution-limit-free Potts model for community detection,” Phys. Rev. E 81, 046114

(2010) [0803.2548].

[27]

G. Tibely, J. Kertesz, “Note on the equivalence of the label propagation method of community detection and a Potts

model approach,” Physica A 387, 4982 (2008) [0803.2804].

51 - References III

[28]

A. Arenas, A. D. Guilera, J. Kurths, Y. Moreno, C. Zhou, “Synchronization in complex networks,” Phys. Rep. 469, 93

(2008) [0805.2976].

[29]

A. Lancichinetti, S. Fortunato, F. Radicchi, “Benchmark graphs for testing community detection algorithms,” Phys. Rev.

E 78, 046110 (2008) [0805.4770].

[30]

A. Pluchino, A. Rapisarda, V. Latora, “Communities recognition in the Chesapeake Bay ecosystem by dynamical

clustering algorithms based on different oscillators systems,” Proc. Intl. Workshop on Ecological Complex Systems:

Stochastic Dynamics and Patterns, 22 (2007) [0806.4276].

[31]

I. X. Y. Leung, P Hui, P Lio’, J Crowcroft, “Towards real-time community detection in large networks,” Phys. Rev. E 79,

066107 (2009) [0808.2633].

[32]

P. Ronhovde, Z. Nussinov, “Multiresolution community detection for megascale networks by information-based replica

correlations,” Phys. Rev. E 80, 016109 (2009) [0812.1072].

[33]

M. J. Barber, J. W. Clark, “Detecting network communities by propagating labels under constraints,” Phys. Rev. E 80,

026129 (2009) [0903.3138].

[34]

A. Lancichinetti, S. Fortunato, “Benchmarks for testing community detection algorithms on directed and weighted

graphs with overlapping communities”, Phys. Rev. E 80, 016118 (2009) [0904.3940].

[35]

S. Fortunato, “Community detection in graphs,” Phys. Rep. 486, 74 (2010) [0906.0612].

[36]

M. Rosvall, D. Axelsson, C. T. Bergstrom, “The map equation”, Eur. J. Phys. ST 178, 13 (2009) [0906.1405].

[37]

A. Lancichinetti, S. Fortunato, “Community detection algorithms: a comparative analysis,” Phys. Rev. E 80, 056117

(2009) [0908.1062].

[38]

V. A. Traag, P. V. Dooren, Y. Nesterov, “Narrow scope for resolution-limit-free community detection”, Phys. Rev. E 84,

016114 (2011) [1104.3083].

[39]

L. Subelj, M. Bajec, “Unfolding communities in large complex networks: Combining defensive and offensive label

propagation for core extraction,” Phys. Rev. E 83, 036103 (2011) [1103.2593].

[40]

G Cordasco, L. Gargano, “Community Detection via Semi-Synchronous Label Propagation Algorithms,” Proceedings of

The International Workshop on Business Applications of Social Network Analysis (2010) [1103.4550].

[41]

G. Orman, V. Labatut, H. Cherifi, “Qualitative Comparison of Community Detection Algorithms,” Communcations in

Computer and Information Science 167, 265 (2011) [1207.3603].

Revision: a863de0 (2012-09-13)

52