このページは http://www.slideshare.net/shu-t/neurocomputing-121523slideshare の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

3年弱前 (2014/01/07)にアップロードinテクノロジー

Our paper entitled “Quantum Annealing for Dirichlet Process Mixture Models with Applications to N...

Our paper entitled “Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering" was published in Neurocomputing. This work was done in collaboration with Dr. Issei Sato (Univ. of Tokyo), Dr. Kenichi Kurihara (Google), Professor Seiji Miyashita (Univ. of Tokyo), and Prof. Hiroshi Nakagawa (Univ. of Tokyo).

http://www.sciencedirect.com/science/article/pii/S0925231213005535

The preprint version is available:

http://arxiv.org/abs/1305.4325

佐藤一誠さん（東京大学）、栗原賢一さん（Google）、宮下精二教授（東京大学）、中川裕志教授（東京大学）との共同研究論文 “Quantum Annealing for Dirichlet Process Mixture Models with Applications to Network Clustering" が Neurocomputing に掲載されました。

http://www.sciencedirect.com/science/article/pii/S0925231213005535

プレプリントバージョンは

http://arxiv.org/abs/1305.4325

からご覧いただけます。

- Network-Growth Rule Dependence of Fractal Dimension of Percolation Cluster on Square Lattice3年弱前 by Shu Tanaka
- Second-Order Phase Transition in Heisenberg Model on Triangular Lattice with Competing Interactions3年弱前 by Shu Tanaka
- Interlayer-Interaction Dependence of Latent Heat in the Heisenberg Model on a Stacked Triangular Lattice with Competing Interactions3年弱前 by Shu Tanaka

- Quantum Annealing for Dirichlet Process Mixture

Models with Applications to Network Clustering

Issei Sato, Shu Tanaka, Kenichi Kurihara,

Seiji Miyashita, and Hiroshi Nakagawa

Neurocomputing 121, 523 (2013) - Main Results

We considered the eﬃciency of quantum annealing method for

Dirichlet process mixture models. In this study, Monte Carlo

simulation was performed.

21300

Wikivote

21200

er

21100

Bett

Diff. of log-likelihood

21000 2.5 3

3.5

4

4.5

5

0

- We constructed a method to apply quantum annealing to network

clustering.

- Quantum annealing succeeded to obtain a better solution than

conventional methods.

- The number of classes can be changed.

(cf. K. Kurihara et al. and I. Sato et al., UAI2009)

K. Kurihara et al., I. Sato et al., Proceedings of UAI2009. - Background

Optimization problem

To find the state (best solution) where the real-valued cost

function is minimized.

If the size of problem is small, we can easily obtain the best solution

by brute-force calculation.

However...

if the size of problem is large, we cannot obtain the best solution by

brute-force calculation in practice.

We should develop methods to obtain the best solution (at least,

better solution) eﬃciently. - Background

Cost function of most optimization problems can be represented by

Hamiltonian of classical discrete spin systems.

We can use the knowledge of statistical physics.

To find the state where the

To find the ground state of

cost function is minimized.

the Hamiltonian.

Simulated annealing (SA)

By decreasing the temperature (thermal fluctuation) gradually,

the ground state of the Hamiltonian is obtained.

S. Kirkpatrick, C. D. Gelatte, and M. P. Vecchi, Science, 220, 671 (1983).

SA can be adopted to both stochastic methods such as Monte Carlo

method and deterministic method. - Background

Quantum annealing (QA)

By decreasing the quantum fluctuation gradually, the ground

state of the Hamiltonian is obtained.

T. Kadowaki and H. Nishimori, Phys. Rev. E, 58, 5355 (1998).

E. Farhi, J. Goldstone, S. Gutmann, J. Lapan, A. Lundgren, and D. Preda, Science, 292, 472 (2001).

G. E. Santoro, R. Martonak, E. Tosatti, and R. Car, Science, 295, 2427 (2002).

Review articles

G. E. Santoro and E. Tosatti, J. Phys. A: Math. Gen., 39, R393 (2006).

A. Das and B. K. Chakrabarti, Rev. Mod. Phys., 80, 1061 (2008).

S. Tanaka and R. Tamura, Kinki University Series on Quantum Computing Series "Lectures on Quantum

Computing, Thermodynamics and Statistical Physics" (2012).

QA is better than SA? - What is CRP?
- Chinese Restaurant Process (CRP)

Table (data class)

Restaurant (entire set)

1

2

3

1

2 3

5

4

Customer (data point)

Chinese Restaurant Process (CRP) assigns a probability for the

seating arrangement of the customers. - Chinese Restaurant Process (CRP)

Seating arrangement of the customers: Z = {zi}Ni=1

customer i sits at the k-th table: z

N: the number of customers

i = k

When customer i enters a restaurant with K occupied tables at

which other customers are already seated, customer i sits at a table

with the following probability:

Nk

+N 1

(k-th occupied table)

p(zi|Z\zi; )

(new unoccupied table)

+N 1

Nk: the number of customers sitting at the k-th table

: hyper parameter of the CRP

K(Z)

K(Z)

The log-likelihood of is g

Z

iven by p(Z) =

(N

N

(

k

1)!

N

+ )

=1

k=1 - What is QACRP?
- Quantum annealing for CRP (QACRP)

QACRP uses multiple restaurants (m restaurants).

customer i sits at the k-th table in the j-th restaurant: zj,i = k

Seating arrangement of the customers in the j-th restaurant: Zj = {zj,i}

In the j-th restaurant, when customer i enters a restaurant with K

occupied tables at which other customers are already seated,

customer i sits at a table with the following probability:

/m

Nj,k

e(c (i)+c+ (i))f( , )

j,k

j,k

+N 1

(k-th occupied table)

pQA(zj,i| {Zd}m

d=1 \ {zj,i} ;

, )

/m

+N 1

(new unoccupied table)

: inverse temperature (thermal fluctuation)

: quantum fluctuation - Quantum annealing for CRP (QACRP)

/m

Nj,k

e(c (i)+c+ (i))f( , )

j,k

j,k

+N 1

(k-th occupied table)

pQA(zj,i| {Zd}m

d=1 \ {zj,i} ;

, )

/m

+N 1

(new unoccupied table)

c± (i)

: the number of customers who sit at the k-th table in the j-th

j,k

restaurant and share tables with customer i in the -th

(j ± 1)

restaurant.

j-1-th CRP

j-th CRP

j+1-th CRP

1

2

3

1

2

3

1

2

3

1

2 3

5

1

4 3

5

2

4 1

5

4

3

2

The above fact will be proven in the following. - Quantum annealing for CRP (QACRP)

Bit matrix representation for CRP

A bit matrix : adjac

B

ency matrix of customers

1 2 3 4 5

1 1 1 0 1 0

2 1 1 0 1 0

3 0 0 1 0 1

1

4 3

5

4 1 1 0 1 0

2

5 0 0 1 0 1

B

= N

N

˜

i=1

n=1

i,n

Seating conditions ˜

Bi,n = Bn,i

Bi,i = 1 (i = 1, 2, · · · , N)

Sitting arrangement

i, , Bi/|Bi| · B /|B | = 1 or 0

can be represented by

the Ising model with

constraints. - Quantum annealing for CRP (QACRP)

Bit matrix representation for CRP

1 2 3 4 5

1

4 3

5

1 1 1 0 1 0

2

2 1 1 0 1 0

2

1

3 0 0 1 0 1

2 1 1 0 1 0

0

4 1 1 0 1 0

customers who share a

1

table with customer 2.

5 0 0 1 0 1

0

1

4 3

5

1 2 3 4 5

1 1

0 1 0

1

0

0

2

1 1 0 1 0

0 1 1 0 1

0 1 0 0 0

0

1

0

3 0

1 0 1

1

0

0

0

1

0

4 1

0 1 0

a set of the states that customer 2 can take

5 0

1 0 1

under the seating conditions. - Quantum annealing for CRP (QACRP)

Density matrix representation for “classical” CRP

2

H

)

c = diag[E( (1)), E( (2)), · · · E( (2N

)]

ln p( ( ))

( )

˜

E( ( )) =

+

( )

\˜

Te Hc

Te Hc

p( ) =

=:

Te Hc

Zc

Sitting arrangement can be represented by the

Ising model with constraints. - Quantum annealing for CRP (QACRP)

Formulation for quantum CRP

H = Hc + Hq

Hc : classical CRP

Hq

: quantum fluctuation

Te (Hc+Hq)

pQA( ; , ) =

Te

(Hc+Hq)

Classical CRP

Te Hc

p(˜i| \˜i) =

Te Hc

˜i

Nk

+N 1

(k-th occupied table)

p(zi|Z\zi; )

(new unoccupied table)

+N 1 - Quantum annealing for CRP (QACRP)

Formulation for quantum CRP

H = Hc + Hq

Hc : classical CRP

Hq

: quantum fluctuation

Te (Hc+Hq)

pQA( ; , ) =

Te

(Hc+Hq)

Quantum CRP

Te (Hc+Hq)

pQA(˜i| \˜i; , ) =

Te

(Hc+Hq)

˜i

Transverse field as a quantum fluctuation

N

N

1 0

0 1

H

x

q =

i,n, E =

0 1

, x =

1 0

i=1 n=1 - Quantum annealing for CRP (QACRP)

Approximation inference for QACRP By the Suzuki-Trotter decomposition,

Te (Hc+Hq)

p

pQA can be approximately expressed

QA( ;

, ) =

Te

(Hc+Hq)

by the classical CRP.

2

=

pQA ST( , 2, · · · , m; , ) + O m

j (j

2)

m e E( j)/mef( , )s( j, j+1)

pQA ST( 1, 2, · · · , m; , ) =

Z( , )

j=1

f ( , ) = 2 ln coth

m

N

N

s( j, j+1) =

(˜j,i,n, ˜j+1,i,n)

i=1 n=1

2N

Z( , ) = sinh

e E( )

m

m - I. Sato et al. / Neurocomputing 121 (2013) 523–531

527

Experiments

Network model & dataset

Citeseer

citation network dataset for 2110

papers.

I. Sato et al. / Neurocomputing 121 (2013) 523–531

527

Fig. 5. Examples of network structures. (a) Social network (assortive network), (b) election network (disassortative network) and (c) citation network (mixture of assortative

and disassortative network).

Netscience

coauthorship network of

We consider multiple running CRPs in which sjðj ¼ 1; …; mÞ

regarded as a similarity function between the j-th and (j+1)-th bit

indicates the seating arrangement of the j-th CRP and represents

matrices. If they are the same matrices, then sðsj; sjþ1Þ ¼ N2. In

scientists working on a network

the j-th bit matrix Bj. We correspond Bj

Eq. (2), log p

;i;n ¼ 1 to ~

sj;i;n ¼ ð1; 0Þ⊤ and

SAðsjÞ corresponds to log e−β=mEðsjÞ=Z and the regularizer

Bj

term f

efðβ;ΓÞsðsj;sj

s

;i;n ¼ 0 to ~

sj;i;n ¼ ð0; 1Þ⊤, which means that we can represent Bj as

Á Rðs1; …; smÞ is log ∏m

j

þ1 Þ

¼ 1

¼ f ðβ; ΓÞ∑m

j ¼ 1 ðsj; sjþ1Þ.

that has 1589 scientists.

sj by using Eq.

I.

(5). Sato

We et al. /

deriv Neur

e

ocomputing

the

1

following 21 (2013) 523

theorem:

–531

Note that we aim at deriving the 527

approximation inference for

p

Theorem 3.1. p

QAð ~

sijs\ ~si; β; ΓÞ in Eq. (13). Using Theorem 3.1, we can derive

QAðs; β; ΓÞ in Eq. (10) is approximated by the Suzuki–

Eq. (4) as the approximation inference. The details of the deriva-

Trotter expansion as follows:

tion are provided in Appendix B.

1

pQAðs; β; ΓÞ ¼ Z s⊤e−βðHcþHqÞs

Wikivote

!

β2

¼ ∑ pQA−STðs; s2; …; sm; β; ΓÞ þ O

a bipertite network constructed

m ;

ð15Þ

4. Experiments

sjðj≥2Þ

where we rewrite s as s

using administrator elections.

1, and

We evaluated QA in a real application. We applied QA to a DPM

m

1

model for clustering vertices in a network where a seating

pQA−STðs1; s2; …; sm; β; ΓÞ ¼ ∏

e−β=mEðsjÞefðβ;ΓÞsðsj;sjþ1Þ;

ð16Þ

arrangement of the CRP indicates a network partition.

7115 Wikipedia users.

j

Z

¼ 1 ðβ; ΓÞ

Fig. 5. Examples of network structures. (a) Social network (assortive network), (b) election network (disassortative network) and (c) citation network (mixture of assortative

and

β

disassortative network).

f ðβ; ΓÞ ¼ 2 log coth

Γ

m

;

ð17Þ

4.1. Network model

N

N

sðs

We used the Newman model [17] for network modeling in this

j; sj

∑ δ

þ1Þ ¼ ∑

ð ~sj;i;n; ~sjþ1;i;nÞ;

ð18Þ

i ¼ 1 n ¼ 1

We consider multiple running CRPs in which

work. The Newman model is a probabilistic generative network

sjðj ¼ 1; …; mÞ

regarded as a similarity function between the j-th and (j+1)-th bit

model. This model is

indicates

ﬂexible, which enables researchers to analyze

2N

the seating arrangement of the j-th CRP and represents

matrices. If they are the same matrices, then sðsj; sjþ1Þ ¼ N2. In

β

Z

observed graph data without specifying the network structure

ðβ; ΓÞ ¼ sinh

Γ the j-th

∑e− bit

β=mEðmatrix

sÞ:

Bj. We correspond Bjð19Þ

Eq. (2), log p

;i;n ¼ 1 to ~

sj;i;n ¼ ð1; 0Þ⊤ and

SAðsjÞ corresponds to log e−β=mEðsjÞ=Z and the regularizer

m

(disassortative or assortative) in advance.

B

s

j

term f

efðβ;ΓÞsðsj;sj

s

;i;n ¼ 0 to ~

sj;i;n ¼ ð0; 1Þ⊤, which means that we can represent Bj as

Á Rðs1; …; smÞ is log ∏m

In an assortative network, such as a social

j

þ1 Þ

¼ 1

¼ f ðβ; ΓÞ∑m

network, the

j ¼ 1 ðsj; sjþ1Þ.

sj by using Eq. (5). We derive the following theorem:

Note that we aim at deriving the approximation inference for

members (vertices) of each class are mostly connected to the

p

Note that sm

oof is given in Appendix A. Note that

other members of the same class. The communications between

þ1 ¼ Theorem

s1. The pr 3.1. p

QAð ~

sijs\ ~si; β; ΓÞ in Eq. (13). Using Theorem 3.1, we can derive

QAðs; β; ΓÞ in Eq. (10) is approximated by the Suzuki–

Fig. 5. Examples of network structures. (a) Social

our

networ

derived f k

in (assortiv

Eq. (1 e

7) networ

does k),

not (b) election

include

networ

the

k (disassortativ

number of

e

classes, networ

K,

k) and (c) citation

members in

networ

three

k

social Eq.

(m

(4)

ixture of

groups as

is the appro

assortative

illustrated ximation

in Fig. 5, inference.

where one The details of the deriva-

Trotter expansion as follows:

and disassortative network).

whereas the f in existing work [12,20] is formulated by using a

sees that the members tion

generare

ally provided in Appendix

communicate more

B

with .others

ﬁxed K.

1

in the same group than they do with those outside the group. In a

pQAðs; β; ΓÞ ¼

Eq. (15) is interpreted as follow Z s⊤e−βðHcþHqÞs

s. pQAðs; β; ΓÞ is approximated by

disassortative network, the members (vertices) have most of their

!

marginalizing out other states fsjgj≥2 of pQA−STðs1; s2; …; sm; β; ΓÞ.

connections outside their class. An election network of supporters

β2

As shown in Eq. (16), pQA−STðs1 ¼

; s2; ∑

…; p

s QA

m; −

β ST

; Γðs

Þ ; s2;

look …

s ; sm

like ; β;

the ΓÞ þ

jointO

We consider multiple running CRPs in which s

and candidates is illustrated in Fig. 5b, where a link indicates

jðj ¼ 1; …; mÞ

regarded as a similarity

m ;

ð15Þ

function between the j-th 4. Experiments

and (j+1)-th bit

sjðj≥2Þ

indicates the seating arrangement of the

probabilityj-th

of CRP

the

and

states repr

of m esents

dependent matrices.

CRPs. In Eq. If(1they

6), e−β=mEðs

are jÞthe same matrices,

support for a

then s

candidate. The Newman model can model not only

ðsj; sjþ1Þ ¼ N2. In

corresponds to the classical CRP with inverse temperature and

these two kinds of networks but also a mixture of them, such as a

the j-th bit matrix B

where we rewrite

j. We correspond Bj

Eq. (2), log p

he regularizer

;i;n ¼ 1 to ~

sj;i;n ¼ ð1; 0Þ⊤ and s as s1, and

SAðsjÞ corresponds to log e−β=mEðsjÞ=Z and t We evaluated QA in a real application. We applied QA to a DPM

efðβ;ΓÞsðsj;sjþ1Þ indicates the quantum effect part. If f

citation network (see Fig. 5c), but, the user must decide in advance

B

ðβ; ΓÞ ¼ 0, which

j

term f

efðβ;ΓÞsðsj;sj

s

;i;n ¼ 0 to ~

sj;i;n ¼ ð0; 1Þ⊤, which means that we can represent Bj as

Á R m

ðs1;

means CRPs are independent, p

1

…; smÞ is log ∏m

j

þ1 Þ ¼ f ðβ; Γmodel

Þ∑m

the number of classes. Wej

ð for

sj; sj clustering

þ1Þ.

vertices in a network where a seating

therefore used the DPM extension of

p

¼ 1

¼ 1

QA−STðs1; s2; …; sm; β; ΓÞ is equal to

sj by using Eq. (5). We derive the following

QA−

theorem: STðs1; s2; …; sm; β; ΓÞNo¼

te ∏

that we e−β=mEðsjÞ

aim at

efðβ;ΓÞsðsj;sjþ1Þ

deriving the ;

ð16

appro Þ

arra

ximation ngement

inference of

for the CRP indicates a network partition.

the products of probability of m classical CRPs. sð

the Newman model as described in Appendix C.

jsj; s Z Þð 4 0Þ is

¼ 1j ð

þ β

1 ; ΓÞ

p

Theorem 3.1. p

QAð ~

sijs\ ~si; β; ΓÞ in Eq. (13). Using Theorem 3.1, we can derive

QAðs; β; ΓÞ in Eq. (10) is approximated by the Suzuki–

Eq. (4) as the approximation inference. The details of the deriva-

β

Trotter expansion as follows:

f ðβ; ΓÞ ¼ 2 log coth

Γ

m

tion ;are provided in Appendix B.

ð17Þ

4.1. Network model

1

pQAðs; β; ΓÞ ¼ Z s⊤e−βðHcþHqÞs

N

N

s

! ðs

We used the Newman model [17] for network modeling in this

j; sj

∑ δ

þ1Þ ¼ ∑

ð ~sj;i;n; ~sjþ1;i;nÞ;

ð18Þ

β2

i

p

¼ 1 n ¼ 1

work. The Newman model is a probabilistic generative network

¼ ∑

QA−STðs; s2; …; sm; β; ΓÞ þ O m ;

ð15Þ

4. Experiments

sjðj≥2Þ

model. This model is

ﬂexible, which enables researchers to analyze

β

2N

where we rewrite

Z

observed graph data without specifying the network structure

s as s1, and

ðβ; ΓÞ ¼ sinh

Γ

∑e−β=mEðsÞ

m

:

We evaluated QA in a real application. ð19

W Þ

e applied QA to a DPM

s

(disassortative or assortative) in advance.

m

1

model for clustering vertices in a network

In

where an

a

assortati

seating

ve network, such as a social network, the

pQA−STðs1; s2; …; sm; β; ΓÞ ¼ ∏

e−β=mEðsjÞefðβ;ΓÞsðsj;sjþ1Þ;

ð16Þ

arrangement of the CRP indicates a network partition.

j

Zðβ; ΓÞ

members (vertices) of each class are mostly connected to the

¼ 1

Note that sm

other members of the same class. The communications between

þ1 ¼ s1. The proof is given in Appendix A. Note that

β

our derived f in Eq. (17) does not include the number of classes, K,

members in three social groups is illustrated in Fig. 5, where one

f ðβ; ΓÞ ¼ 2 log coth

Γ

17

m

;

whereas theð f in

Þ

existing work [12,20] is formulated by using a

sees that the members generally communicate more with others

4.1. Network model

ﬁxed K.

in the same group than they do with th - Experiments

Annealing schedule

m : Trotter number, the number of replicas; m = 16

We tested several schedules of inverse temperature.

0 ln(1 + t)

=

0

t

0 = 0.2m, 0.4m, 0.6m

0t

t : t-th iteration.

= 0.4m t is a better schedule in SA (MAP estimation).

=

T

is a schedule of quantum fluctuation.

m

0 t

T : Total number of iterations - Results

21300

Wikivote

er

Bett

21200

21100

Diff. of log-likelihood

21000 2.5 3

3.5

4

4.5

5

0

Lmax : the maximum log-likelihood of the beam search

Beam

Lmax : the maximum log-likelihood of 16 CRPs in SA

16SAs - I. Sato et al. / Neurocomputing 121 (2013) 523–531

527

Results

Citeseer

1600

1400

Diff. of log-likelihood 1200

I. Sato et al. / Neurocomputing 121 (2013) 523–531

527

1.5

2

2.5

3

3.5

Fig. 5. Examples of network structures. (a) Social network (assortive network), (b) election network (disassortative network) and (c) citation network (mixture of assortative

and

0

disassortative network).

700

Netscience

600

We consider multiple running CRPs in which sjðj ¼ 1; …; mÞ

regarded as a similarity function between the j-th and (j+1)-th bit

indicates the seating arrangement of the j-th CRP and represents

matrices. If they are the same matrices, then sðsj; sjþ1Þ ¼ N2. In

the j-th bit matrix Bj. We correspond Bj

Eq. (2), log p

;i;n ¼ 1 to ~

sj;i;n ¼ ð1; 0Þ⊤ and

SAðsjÞ corresponds to log e−β=mEðsjÞ=Z and the regularizer

500

Bj

term f

efðβ;ΓÞsðsj;sj

s

;i;n ¼ 0 to ~

sj;i;n ¼ ð0; 1Þ⊤, which means that we can represent Bj as

Á Rðs1; …; smÞ is log ∏m

j

þ1 Þ

¼ 1

¼ f ðβ; ΓÞ∑m

j ¼ 1 ðsj; sjþ1Þ.

sj by using Eq. (5). We derive the following theorem:

Note that we aim at deriving the approximation inference for

Diff. of log-likelihood

pQAð ~sijs\ ~si; β; ΓÞ in Eq. (13). Using Theorem 3.1, we can derive

400

Theorem 3. I.

1. Sato

pQA et

ðs al.

; β; / Neur

ΓÞ in ocomputing

Eq. (10) is 121 (2013) 523–

approximated 531

by the Suzuki–

527

Eq. (4) as the approximation inference. The details of the deriva-

1

1.5

Trotter

2

expansion 2.5

as follows: 3

tion are provided in Appendix B.

0 1

pQAðs; β; ΓÞ ¼ Z s⊤e−βðHcþHqÞs

21300

Wikivote

!

β2

¼ ∑ pQA−STðs; s2; …; sm; β; ΓÞ þ O m ;

ð15Þ

4. Experiments

sjðj≥2Þ

21200

where we rewrite s as s1, and

We evaluated QA in a real application. We applied QA to a DPM

m

1

model for clustering vertices in a network where a seating

p

e−β=mEðs

Better solution

j Þef ðβ;ΓÞsðsj;sjþ1Þ

21100

QA−STðs1; s2; …; sm; β; ΓÞ

¼ ∏

;

ð16Þ

arrangement of the CRP indicates a network partition.

j

Z

¼ 1 ðβ; ΓÞ

Diff. of log-likelihood

can be obtained

β

f

21000

ðβ; ΓÞ ¼ 2 Fig.

log 5. Examples

coth

Γ of; network structures. (a) Social network (assortiv

ð17Þ e network), (b) election network (disassortative network) and (c) citation network (mixture of assortative

m

4.1. Network model

2.5

3

3.5 and 4

4.5

disassortative

5

network).

by QA.

N

N

0

sðs

We used the Newman model [17] for network modeling in this

j; sj

∑ δ

þ1Þ ¼ ∑

ð ~sj;i;n; ~sjþ1;i;nÞ;

ð18Þ

i ¼ 1 n ¼ 1

work. The Newman model is a probabilistic generative network

model. This model is

ﬂexible, which enables researchers to analyze

W

e

β consider

2N

multiple running CRPs in which sjðj ¼ 1; …; mÞ

regarded as a similarity function between the j-th and (j+1)-th bit

Z

observed graph data without specifying the network structure

ðβ; ΓÞ ¼ indicates

sinh

Γ

∑e−β=mEðsÞ

19

m

the seating arrang

:

ement of the j-th CRP

ð and

Þ

represents

(disassortative matrices.

or

If

assortativ they

e) in are

adv the

ance. same matrices, then s

s

ðsj; sjþ1Þ ¼ N2. In

the j-th bit matrix Bj. We correspond Bj

In an

Eq. (2)

assortative, log p

;i;n ¼ 1 to ~

sj;i;n ¼ ð1; 0Þ⊤ and

SA

networðs

k, jÞ corr

such espo

as nds

a

to log

social

e−β=mEðsjÞ

network, =Z an

the d the regularizer

Bj

members

term

(vertices) fof each class are mostly ef ðβ;ΓÞsðsj;sj

connected to the

s

;i;n ¼ 0 to ~

sj;i;n ¼ ð0; 1Þ⊤, which means that we can represent Bj as

Á Rðs1; …; smÞ is log ∏m

j

þ1 Þ

¼ 1

¼ f ðβ; ΓÞ∑m

j ¼ 1 ðsj; sjþ1Þ.

Note sj

that by

sm using . The proof is given in Appendix A. Note that

other members of the same class. The communications between

þ1 ¼ s1 Eq. (5). We derive the following theorem:

Note that we aim at deriving the approximation inference for

our derived f in Eq. (17) does not include the number of classes, K,

members in

p

three social groups is illustrated in Fig. 5, where one

whereas Theorem

the f in e 3.1. p

QAð ~

sijs\ ~si; β; ΓÞ in Eq. (13). Using Theorem 3.1, we can derive

xisting QAðs

wor ;k β; Γ

[12Þ, in

20] Eq

is. (10) is approximated

formulated by using by

a

the Suzuki

sees

–

that the Eq. (4) as

members the

gener appro

ally

ximation inference.

communicate more

The

with

details

others

of the deriva-

Fig. 5. Examples of network structures. (a) Social network

ﬁxed K.

Trotter

(assortivexpansion

e network), as

(b) follows

election :

network (disassortative network) and (c) citation network (mixture of assortative

in the same

tion

group are

than provided

they do

in

with Appendix

those

B.

outside the group. In a

and disassortative network).

Eq. (15) is interpr

1

eted as follows. pQAðs; β; ΓÞ is approximated by

disassortative network, the members (vertices) have most of their

pQA

marginalizingðs; β

out; ΓÞ ¼

other states

of p

connections outside their class. An election network of supporters

Z s⊤e−βðHcþHqÞ

fsjgj≥2 s

QA−STðs1; s2; …; sm; β; ΓÞ.

As shown in Eq. (16), pQA−STðs1; s2; …; sm; β; ΓÞ looks like

the !

joint

and candidates is illustrated in Fig. 5b, where a link indicates

β2

probability of the states of m dependent CRPs. In Eq. (16), e−β=mEðsjÞ

support for a candidate. The Newman model can model not only

¼ ∑ pQA−STðs; s2; …; sm; β; ΓÞ þ O

We consider multiple running CRPs in which

corresponds to the

1

m

classical CRP regar

with ded

inv

as

erse a

t

m ;

ð15Þ

similarity

emperature function

and

between

these two the 4.

j

kinds Experiments

sjðj ¼ s; …

-th

of and (j+

networ 1)-th

ks but bit

also a mixture of them, such as a

j ðj≥2

; Þ Þ

indicates the seating arrangement ofethe

f ðβ;ΓÞsjð-th

sj;sjþ CRP

1Þ

and repr

indicates

esents

the quantum matrices.

effect part. If fthey

ðβ; ΓÞ ar

¼ e

0, the same

which

matrices,

citation

then

network sðsj;

(see sj

Fig. 5c), but, the user must decide in advance

means

where

CRPs are we rewrite s

independent, as

p s

þ1Þ ¼ N2. In

1, and

the j-th bit matrix B

QA− Eq.

STðs1 (2)

; s2,; log

…; sp

m; β; ΓÞ is equal to

the number of

We ev

classes. aluated

We

QA

therefor in

e a real

used

application.

the DPM ext We

ensionapplied

of

QA to a DPM

j. We correspond Bj;i;n ¼ 1 to ~

sj;i;n ¼ ð1; 0Þ⊤ and

SAðsjÞ corresponds to log e−β=mEðsjÞ=Z and the regularizer

B

the products of probability of m classicalmCRPs. s

the Newman model as described in Appendix C.

j

term f

ðsj; sjþ1Þð40Þ is

efðβ;ΓÞsðsj;sj

s

;i;n ¼ 0 to ~

sj;i;n ¼ ð0; 1Þ⊤, which means that we can represent Bj as

Á Rð 1

þ1 Þ

model for clustering vertices in a network where a seating

s

p

1; …; smÞ is log ∏m

j ¼ 1

¼ f ðβ; ΓÞ∑m

j ¼ 1 ðsj; sjþ1Þ.

QA−STðs1; s2; …; sm; β; ΓÞ

¼ ∏

e−β=mEðsjÞefðβ;ΓÞsðsj;sjþ1Þ;

ð16Þ

arrangement of the CRP indicates a network partition.

sj by using Eq. (5). We derive the following theorem:

Note that we aim at deriving the approximation inference for

j

Z

¼ 1 ðβ; ΓÞ

p

Theorem 3.1. p

QAð ~

sijs\ ~si; β; ΓÞ in Eq. (13). Using Theorem 3.1, we can derive

QAðs; β; ΓÞ in Eq. (10) is approximated by the Suzuki– β

Eq. (4) as the approximation inference. The details of the deriva-

Trotter expansion as follows:

f ðβ; ΓÞ ¼ 2 log coth

Γ

m

;

ð17Þ

tion are provided in Appendix B.

4.1. Network model

1

pQAðs; β; ΓÞ ¼

N

N

Z s⊤e−βðHcþHqÞs

s

ðs

We used the Newman model [17] for network modeling in this

j;

!sj

∑ δ

þ1Þ ¼ ∑

ð ~sj;i;n; ~sjþ1;i;nÞ;

ð18Þ

β2

i ¼ 1 n ¼ 1

work. The Newman model is a probabilistic generative network

¼ ∑ pQA−STðs; s2; …; sm; β; ΓÞ þ O m ;

ð15Þ

4. Experiments

model. This model is

sjðj≥2Þ

ﬂexible, which enables researchers to analyze

β

2N

Z

observed graph data without specifying the network structure

ðβ; ΓÞ ¼ sinh

Γ

∑e−β=mEðsÞ

where we rewrite s as s1, and

m

:

ð19Þ

s

We evaluated QA in a real application. We (disassortativ

applied QA to ea or assortativ

DPM

e) in advance.

m

model for clustering vertices in a network In an

where assortati

a

ve

seating

network, such as a social network, the

1

pQA−STðs1; s2; …; sm; β; ΓÞ ¼ ∏

e−β=mEðsjÞefðβ;ΓÞsðsj;sjþ1Þ;

ð16Þ

arrangement of the CRP indicates a network members

partition. (vertices) of each class are mostly connected to the

j

Z

¼ 1 ðβ; ΓÞ

Note that sm

other members of the same class. The communications between

þ1 ¼ s1. The proof is given in Appendix A. Note that

our derived f in Eq. (17) does not include the number of classes, K,

members in three social groups is illustrated in Fig. 5, where one

β

f ðβ; ΓÞ ¼ 2 log coth

Γ ; - Results

Citeseer

SA(T=30,m=1)

QA(T=30,m=16)

1600

calc. time

13 sec.

15 sec.

1400

16 SAs

1600 SAs beam search QA(m=16)

Diff. of log-likelihood 1200

# classes

35

30

57

37

1.5

2

2.5

3

3.5

0

700

Netscience

SA(T=30,m=1)

QA(T=30,m=16)

calc. time

22 sec.

25 sec.

600

500

16 SAs

1600 SAs beam search QA(m=16)

Diff. of log-likelihood

# classes

22

65

61

26

400 1

1.5

2

2.5

3

0

21300

Wikivote

SA(T=30,m=1)

QA(T=30,m=16)

calc. time

76 sec.

79 sec.

21200

21100

16 SAs

1600 SAs beam search QA(m=16)

Diff. of log-likelihood

# classes

8

8

27

8

21000 2.5 3

3.5

4

4.5

5

0 - Main Results

We considered the eﬃciency of quantum annealing method for

Dirichlet process mixture models. In this study, Monte Carlo

simulation was performed.

21300

Wikivote

21200

er

21100

Bett

Diff. of log-likelihood

21000 2.5 3

3.5

4

4.5

5

0

- We constructed a method to apply quantum annealing to network

clustering.

- Quantum annealing succeeded to obtain a better solution than

conventional methods.

- The number of classes can be changed.

(cf. K. Kurihara et al. and I. Sato et al., UAI2009)

K. Kurihara et al., I. Sato et al., Proceedings of UAI2009. - Thank you !

Issei Sato, Shu Tanaka, Kenichi Kurihara,

Seiji Miyashita, and Hiroshi Nakagawa

Neurocomputing 121, 523 (2013)