このページは http://www.slideshare.net/hirokoonari/suicide-ideation-of-individuals-in-online-social-networks-tokyo-webmining の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

約4年前 (2012/08/25)にアップロードin学び

suicide ideation of individuals in online social networks

- 傾向スコアでみる ソーシャルネットワーク分析4年以上前 by Hiroko Onari
- 2部グラフとソーシャルネットワーク4年以上前 by Hiroko Onari
- 複雑ネットワーク勉強会 二部グラフの基礎と応用 201202084年以上前 by Hiroko Onari

- Suicide ideation of individuals

in online social networks

N. Masuda, I. Kurahashi and H. Onari, arXiv:1207.2548, 2012

Hiroko Onari

#TokyoWebmining

26th of August, 2012 - What is a social network?

• graph that represents

relationships (ties, links)

between independent users

(nodes) - Directed networks

where ties have direction

e.g.) online directed networks

- twitter

- Google+

- YouTube

- Flickr

Undirected networks

where ties have no direction

e.g.) online undirected networks

- mixi

- Facebook

- skype

- LinkedIn - What is suicide?

- association with social isolation -

• Suicide is deﬁned as al cases of death resulting directly or indirectly

from a positive (e.g., shooting oneself) or negative (e.g., refusing to

eat) act of the victim himself, which he knows wil produce this result.

[Durkheim, 1951]

• Suicide is not an individual act nor a personal action. The force, which

determines the suicide, is not psychological but social. Suicide is the

result of social disorganization or lack of social integration or social

solidarity. [Durkheim, 1951]

=> Social Isolation - Social network analysis

on suicide & social isolation

• A small number of friends and a smal fraction of triangles to which an

individual belongs signiﬁcantly contribute to suicide ideation of social

isolation. (by study the relationship between suicidal behavior and

egoentric social networks among adolescents)

[Bearman & Moody, 2004] [Cui et al., 2010]

• The paucity of triangles, or intransitivity also characterizes social

isolation. [Wasserman & Faust, 1994] [Bearman & Moody, 2004]

• Individuals without triangles are considered to lack membership to

social group even if they have many friends. [Krackhardt, 1999] - Social Statistics by OECD

Japanʼ’s suicide rate per 100,000 persons is higher

than any other OECD country.

Denmark

Greece

Hungary

Ireland

Japan

Switzerland

OECD average

45

40

35

30

25

Japan

20

15

10

5

0

1960 1962 1964 1966 1968 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006

Suicide rates

Suicide rates

per 100,000 persons per year

per 100,000 persons, 1960 - 2006 - Research Questions

• From the perspective of network science, can we

observe indications for reducing suicide by the

quantitative analysis in online social network?

• Can we say that online social network reﬂect real

personal relationship? - Data

• Social Network of 2 . 7

×

10 7registered users from mixi as of March 2012.

• More than 4. 5 ×

10 6 user-deﬁned communities on various topics in mixi

as of April 2012. A community is a group of users that have a common

interest such as hobby. The user-deﬁned community is distinctive

feature which other major SNSs like Facebook do not have.

• mixi is a major SNS in Japan, and it launched in 2004. - Analysis Environment

Tokyo Cabinet

Analysis Computer

edge list

ID, friendʼ’s ID

1, 2

1, 3

2, 1

2, 4

2, 5

Perl

Tokyo Cabinet

ID: friendʼ’s IDs

personal info.

1: 2, 3

2: 1, 4, 5

clustering coeﬃcient

community info.

ID, Clustering Coeﬃcient

1, 0.4

result report

2, 0.2

We calculated clustering coeﬃcient in Tokyo Cabinet which is a library of routines for a managing

database and is contained Key-Value store. As API for Tokyo Cabinet, Perl language was used.

Data analysis was implemented in R.

Irrelevant private information was deleted, and relevant information was encrypted. We conducted

al analysis in Tokyo HQ oﬃce of mixi using a computer that is not connected to Internet. - Sampling Procedure (1/2)

• Suicide seed sample (9990 users): Selected 4 communities which are

related with suicide as the fol owing criteria;

(1) the name of user-deﬁned communities includes the word suicide ( jisatsu in Japanese)

(2) at least 1000 members

(3) at least 100 comments posted for each topic

(4) at least 3 independent topics on which comments were made on October, 2011

(5) the admission to join community is open to public

* excluded communities which concentrated on the method of committing suicide and

encouraged members to live with hopes

* discarded users with 0 or 1 friend on mixi

Then, sampled 9990 active users that existed as of January 23, 2012

and logged on to mixi in more than 20 days per month on average from

August through December 2011 from the suicidal communities - Sampling Procedure (2/2)

• Depression seed sample (24410 users): Selected 7 communities

which are related with depression by the way of the similar criteria with

suicide seed sampling. The diﬀerence is the name of user-deﬁned

communities includes the word depression ( utsu in Japanese)

Sampled 24410 active users from the depression communities

• Control seed sample (228949 users): Random sample of active users

that who had at least 2 friends, and did not join the suicidal

communities and the depression-related communities - Measurements of a social network

• The fol owing network indications were adopted.

- degree

- degree distribution

- clustering coeﬃcient

- homophily - Degree

• Degree is the number of neighbors (i.e., Friends), and denoted by k i

for user i. A smal degree is an indicator of social isolation. - Degree distribution

• It is known that the degree distribution of human relationship are long

tailed. Most people have a relatively smal degree, but a few people

have very large degree, being connected to many other people. - Clustering coeﬃcient

• Clustering coeﬃcient is a measure of the number of triangles in a

network. In social networks, clustering coeﬃcient is large, the user is

considered to be embedded in close-knit social groups (Wasserman &

Faust, 1994; Watts & Strogattz,1998; Newman, 2010). A smal value is

an indicator of social isolation.

• Clustering coeﬃcient can be measured in two ways: global clustering

coeﬃcient (often called transitivity ) and local clustering coeﬃcient .

The global measure gives an overal indication of the clustering in the

network, whereas the local measure gives an indication of the

embeddedness of single nodes.

• In our research, we use local clustering coeﬃcient .

* Local clustering coeﬃcient is often used in network science (complex network), and the global

value is often used in sociology. - Local clustering coeﬃcient

for undirected networks

• The local clustering coeﬃcient C i for each vertex(user)

v iis deﬁned by

number of triangles connected to v

C

i

i ≡

ki(ki − 1)/2

.

* By deﬁnition,

0 ≤ Ci ≤ 1.

* k i is degree of the user v i.

* The user who have 0 or 1 friend (

k i

= 0 o r 1) should/can be removed.

• The average of local clustering coeﬃcient is deﬁned by

1 N

�

C ≡

C

N

i.

i=1

* By deﬁnition,

0 ≤ C ≤ .

1

*

N is the total number of users in the network except

k i

= 0 o r 1. - Degree and clustering coeﬃcient

• The inﬂuence of k iand

C ineed to be distinguished careful y.

Here is an example. There are two people with 5 friends, but the

diﬀerent number of links.

ki = 5

ki = 5

0

3

3

Ci =

= 0

=

5(5

C

− 1)/2

i = 5(5 − 1)/2

10 - Degree and clustering coeﬃcient

• Each data point C ( k )for degree is obtained by averaging C iover the

users in a group with degree k. Large ﬂuctuations of

C

( k )at large

values are caused by the paucity of users having large k.

C

i decreases

with

k in many networks (Newman, 2010). - Homophily

• Similar individuals are more likely to become friends. It is cal ed

homophily . In this study, we adopt the fraction of neighbor with

suicide ideation.

• It should be noted that, if a user has relatively many friends with suicide

ideation, it does not necessarily imply that suicide is contagious.

Homophily may be a cause of such assortativity.

• FYI: There is some research to diﬀerentiate the eﬀect of inﬂuence and

homophily (Aral et al., 2009; Shalizi & Thomas, 2011)

*My presentation on the eﬀect of inﬂuence and homphily based on Aralʼ’s paper in slideshare.

http://www.slideshare.net/hirokoonari/ss-13221508 - Homophily

• In this study, users in suicide group has more comparatively similar

friends than users in control group. The same tendency can be said for

users in depression-related group. - Independent variables

Pe

P rson

so al v

a

v riables

Age

Gender

Lo

L c

o a

c l n

etwo

t r

wo k v

a

v riables

degree

number of neighbors (friends)

local clustering coeﬃcient

undirected clustering coeﬃcient

Homophily

number of neighbors who join the suicide / depression community

Be

B hav

a i

v or

o a

r l v

a

v riables

s i n m

ixi

x

Community number

number of communities which a user join

Registration period

number of days between the registration date and Jan. 23, 2012 - Statistical models

• Univariate and multivariate logistic regressions: estimating the

likelihood of belonging to a suicidal or a depressive community

• VIF (variance inﬂation factor): checking the multicol inearity between

independent variables to justify the use of the multivariate logistic

regression. The recommended VIF value is smal er than 10 (preferably

smal er than 5).

• Pearson, Spearman, and Kendall correlation coeﬃcients:

measuring correlation between the independent variables

• AUC (area under the receiver operating characteristic curve):

quantifying the explanatory power of the logistic model. The AUC value

fal s between 0.5 and 1. A large AUC value indicates that logistic

regression ﬁts wel . - Univariate statistics of independent variables

for the suicide and control groups

Suicide group

Control group

(N = 9, 990)

(N = 228, 949)

Variable

p-value

Range

Range

Mean±SD

(min,max)

Mean±SD

(min,max)

Age

27.4±10.3

(17, 97)

27.7±9.2

(14, 96)

0.000652

Community number

283.7±284.3

(1, 1000)

46.3±79.4

(1, 1000)

< 0.0001

ki

82.9±98.7

(2, 1000)

65.8±67.6

(2, 1000)

< 0.0001

Ci

0.087±0.097

(0, 1)

0.150±0.138

(0, 1)

< 0.0001

Homophily (suicide)

0.0110±0.0329

(0, 1.000)

0.0012±0.0080

(0, 0.667)

< 0.0001

Registration period

1235.7±638.9

(122, 2878)

1333.5±670.5

(102, 2891) < 0.0001

Gender (female)

5,786 (57.9%)

126,941 (55.4%)

< 0.0001

No. suicidal communities

1.20±0.51

(1, 4)

N/A

N/A

N/A

No. login days

28.9±4.4

(1, 31)

26.9±6.3

(1, 31)

< 0.0001 - Multivariate logistic regression of suicide

ideation on individual and network variables

Variable

OR

CI

p-value

VIF

Age

1.00463

(1.00211, 1.00716)

0.000313

1.091

Gender (female = 1)

0.821

(0.783, 0.861)

< 0.0001

1.028

Community number

1.00733

(1.00720, 1.00747)

< 0.0001

1.197

ki

0.99790

(0.99758, 0.99821)

< 0.0001

1.156

Ci

0.0093

(0.0069, 0.0126)

< 0.0001

1.081

Homophily (suicide)

2.22 × 1012 (0.57 × 1012, 8.65 × 1012) < 0.0001 1.016

Registration period

0.999383

(0.999346, 0.999420)

< 0.0001

1.135

* OR: odds ratio; CI: 95% conﬁdence interval; VIF: variance inﬂation factor

AUC

0.873

More likely to belong to the suicide group than control group on average;

- A one-year older user is 1.00463 times

- Being female is 1.00463 times

- Membership to one community is 1.00733 times

- Having one friend is 0.99790 times

- An increase in Ci by 0.01 is 0.0093^0.01 = 0.95 times

- An increase in the fraction of friends in the suicide group by 0.01 is (2.22 10^12)^0.01 = 1.33 times

- One day of the registration period is 0.999383 times

AUC is large, so this logistic regression ﬁts wel . - Correlation coeﬃcients between pairs of

independent variables

Control

Suicide

Depression

Variable 1

Variable 2

P

S

K

P

S

K

P

S

K

Age

Gender

−.053

−.026

−.022

−.094

−.137

−.116

−.166

−.174

−.145

Age

Community number

−.032

.023

.015

−.045

−.105

−.073

−.089

−.131

−.091

Age

ki

−.279

−.385

−.271

−.103

−.224

−.157

−.168

−.268

−.187

Age

Ci

.041

−.152

−.111

−.048

−.220

−.154

−.092

−.273

−.192

Age

Homophily (suicide)

−.011

−.090

.074

.031

−.037

−.029

N/A

N/A

N/A

Age

Homophily (depression) −.007 −.083 −.066

N/A

N/A

N/A

.166

.121

−.089

Age

Registration period

.278

.460

.337

.159

.356

.259

.203

.364

.266

Gender

Community number

.110

.116

.095

.205

.204

.166

.086

.083

.068

Gender

ki

.015

.014

.011

.048

.046

.038

.048

.046

.038

Gender

Ci

−.084

−.085

−.069

−.109

−.097

−.080

−.061

−.030

−.024

Gender

Homophily (suicide)

−.012

−.017

−.017

−.007

.031

.028

N/A

N/A

N/A

Gender

Homophily (depression)

.000

.009

.008

N/A

N/A

N/A −.053 −.021 −.018

Gender

Registration period

.025

.025

.020

−.064

−.061

−.050

−.078

−.079

−.065

Community number

ki

.375

.372

.258

.348

.338

.231

.375

.360

.248

Community number

Ci

−.376

−.399

−.277

−.231

−.200

−.136

−.201

−.171

−.116

Community number

Homophily (suicide)

.027

.113

.091

−.034

.140

.105

N/A

N/A

N/A

Community number

Homophily (depression)

.038

.166

.132

N/A

N/A

N/A −.150

.034

.025

Community number

Registration period

.339

.338

.230

.166

.152

.102

.187

.172

.115

ki

Ci

−.363

−.248

−.175

−.251

−.116

−.085

−.240

−.105

−.074

ki

Homophily (suicide)

−.013

.191

.150

−.175

.174

.107

N/A

N/A

N/A

ki

Homophily (depression) −.027

.254

.188

N/A

N/A

N/A −.210

.076

.029

ki

Registration period

.102

.081

.055

.170

.154

.103

.172

.152

.101

Ci

Homophily (suicide)

−.026

−.100

−.080

−.047

−.213

−.162

N/A

N/A

N/A

Ci

Homophily (depression) −.031 −.145 −.114

N/A

N/A

N/A −.055 −.243 −.182

Ci

Registration period

−.221

−.249

−.168

−.143

−.112

−.162

−.133

−.099

−.068

Homophily (suicide)

Registration period

−.039

−.031

−.025

−.104

−.059

−.044

N/A

N/A

N/A

Homophily (depression)

Registration period

−.024

.011

.009

N/A

N/A

N/A −.120 −.049 −.036

* P: Pearson; S: Spearman, K: Kendal correlation coeﬃcients

* > 0.2

These correlation coeﬃcients are suﬃciently smal . - Univariate logistic regression of suicide

ideation on individual and network variables

Variable

OR

CI

p-value

AUC

Age

0.99604

(0.99377, 0.99832)

0.000651

0.515

Gender (female = 1)

1.106

(1.062, 1.152)

< 0.0001

0.512

Community number

1.00728

(1.00716, 1.00741)

< 0.0001

0.867

ki

1.00259

(1.00237, 1.00280)

< 0.0001

0.549

Ci

0.000581

(0.000428, 0.000789)

< 0.0001

0.690

Homophily (suicide)

1.57 × 1016 (0.41 × 1016, 6.08 × 1016) < 0.0001 0.643

Registration period

0.999783

(0.999753, 0.999813)

< 0.0001

0.545

* OR: odds ratio; CI: 95% conﬁdence interval; AUC: area under the curve

- The community number makes by far the largest contribution among the seven independent variables.

- The second largest explanatory power is the AUC 0.690 of clustering coeﬃcient.

This result is consistent with the previous one (Bearman & Moody, 2004).

- The third largest explanatory power is the AUC 0.643 of homophily. - Conclusions

• Online social behavior of users rather than demographic properties. The

below factors contribute to suicide ideation by the largest amounts

- increase in the community number

- decrease in the local clustering coeﬃcient

- increase in the homophily variable

• The age and gender little inﬂuence suicide ideation is inconsistent with

previous ﬁndings (Wray et al., 2011).

• The degree little explains suicide ideation is inconsistent with previous

studies (Bearman & Moody, 2004; Cui et al., 2010).

• User-deﬁned communities of mix cover virtual y al major topics. As a

future study, applying the present methods can be proﬁtable. - Appendix

- Analysis of depressive symptoms - - Univariate statistics of independent variables

for the depression and control groups

Depression group

Control group

(N = 24, 410)

(N = 228, 949)

Variable

p-value

Range

Range

Mean±SD

(min,max)

Mean±SD

(min,max)

Age

28.8±9.4

(16, 97)

27.7±9.2

(14, 96)

< 0.0001

Community number

249.6±263.1

(1, 1000)

46.3±79.4

(1, 1000)

< 0.0001

ki

81.9±88.1

(2, 1000)

65.8±67.6

(2, 1000)

< 0.0001

Ci

0.085±0.089

(0, 1)

0.150±0.138

(0, 1)

< 0.0001

Homophily (depression)

0.0196±0.0501

(0, 1.000)

0.0031±0.0131

(0, 0.667)

< 0.0001

Registration period

1389.4±659.2

(122, 2885)

1333.5±670.5

(102, 2891) < 0.0001

Gender (female)

16,872 (69.1%)

126,941 (55.4%)

< 0.0001

No. suicidal communities

1.16±0.47

(1, 6)

N/A

N/A

N/A

No. login days

28.8±4.4

(1, 31)

26.9±6.3

(1, 31)

< 0.0001 - Multivariate logistic regression of depressive

symptoms on individual and network variables

Variable

OR

CI

p-value

VIF

Age

1.0141

(1.0124, 1.0158)

< 0.0001

1.104

Gender (female = 1)

1.532

(1.481, 1.585)

< 0.0001

1.019

Community number

1.00790

(1.00778, 1.00803)

< 0.0001

1.155

ki

0.99833

(0.99810, 0.99856)

< 0.0001

1.154

Ci

0.0145

(0.0118, 0.0178)

< 0.0001

1.079

Homophily (depression) 1.98 × 1010 (0.99 × 1010, 4.02 × 1010) < 0.0001 1.022

Registration period

0.999744

(0.999720, 0.999769)

< 0.0001

1.117

* OR: odds ratio; CI: 95% conﬁdence interval; VIF: variance inﬂation factor

AUC

0.866 - Univariate logistic regression of depressive

symptoms on individual and network variables

Variable

OR

CI

p-value

AUC

Age

1.0110

(1.0097, 1.0123)

< 0.0001

0.551

Gender (female = 1)

1.799

(1.748, 1.850)

< 0.0001

0.568

Community number

1.00826

(1.00814, 1.00837)

< 0.0001

0.860

ki

1.00258

(1.00243, 1.00274)

< 0.0001

0.566

Ci

0.000415

(0.000338, 0.000509)

< 0.0001

0.692

Homophily (depression) 2.12 × 1012 (1.05 × 1012, 4.28 × 1012) < 0.0001 0.658

Registration period

1.000126

(1.000106, 1.000145)

< 0.0001

0.522

* OR: odds ratio; CI: 95% conﬁdence interval; AUC: area under the curve