このページは http://www.slideshare.net/DeepLearningJP2016/dlgraph-convolutional-policy-network-for-goaldirected-molecular-graph-generation-nips2018 の内容を掲載しています。

掲載を希望されないスライド著者の方は、削除申請よりご連絡下さい。

埋込み型プレイヤーを使用せず、常に元のサイトでご覧になりたい方は、自動遷移設定をご利用下さい。

14日前 (2018/11/02)にアップロードinテクノロジー

2018/11/02

Deep Learning JP:

http://deeplearning.jp/seminar-2/

- [DL輪読会]ClusterNet: Detecting Small Objects in Large Scenes by Exploiting Spatio-Temporal Information (CVPR2018) 14日前 by Deep Learning JP
- [DL Hacks]Semi-Supervised Classification with Graph Convolutional Networks14日前 by Deep Learning JP
- [DL輪読会]Adversarial Text Generation via Feature-Mover's Distance (NIPS 2018)14日前 by Deep Learning JP

- DEEP LEARNING JP

Graph Convolutional Policy Network for Goal-Directed

[DL Papers]

Molecular Graph Generation (NIPS2018)

Kazuki Fujikawa, DeNA

http://deeplearning.jp/

1 - サマリ

• 書誌情報

– Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation

• NIPS2018（to appear）

• Jiaxuan You, Bowen Liu, Rex Ying, Vijay Pande, Jure Leskovec

• 概要

– Graph Convolutional Policy Network（GCPN）を提案

• 強化学習で所望の属性を最適化する分子グラフを生成する

• ドメイン特有の報酬と敵対的な損失が最適化されるように方策を学習する

– 分子属性の最適化、ターゲティングなどの実験で既存の手法を上回る性能を示した

• 生成過程で原子価チェックが入るため、原子価に違反した分子が生まれることは無い

2 - アウトライン

• 背景

• 関連研究

• 提案手法

• 実験・結果

3 - アウトライン

• 背景

• 関連研究

• 提案手法

• 実験・結果

4 - 背景

• 目的関数が最適化できるグラフ構造を生成することは、創薬や材料化学の分野

において重要

– 一般的な分子構造設計では、原子価などの物理法則に従いながら、Drug-likenessや合成

可能性といった特性が理想的な値にすることを考える

– 複雑で微分不可能なルールに対して最適化することは依然として困難

• 可変長のグラフを直接生成することは容易ではない

– 自然言語のような直列の系列と比較して、「分岐・結合種の存在」「始点が不明確」などの

理由で難易度が高い

5

図引用: Gomez-Bombarelli+, 2018 - アウトライン

• 背景

• 関連研究

• 提案手法

• 実験・結果

6 - 関連研究

• 扱うデータ形式の違いで2種類に大別できる

– テキストベース

• SMILES

– 分子の化学構造を文字列で表現する記法

• SMILES CFG (Context-free Grammar)

– SMILESを生成する文脈自由文法の生成規則列

– グラフベース

• 隣接行列を直接生成

• ノード・結合を自己回帰的に生成（隣接行列を一行ずつ生成）

分子名

構造式（グラフ）

隣接行列

SMILES

SMILES CFG

smiles → chain

0

1

0

0

0

1

chain → chain, branched atom

chain → branched atom

1

0

1

0

0

0

branched atom → atom, ringbond

branched atom → atom

0

1

0

1

0

0

atom → aromatic organic

ベンゼン

c1ccccc1

atom → aliphatic organic

0

0

1

0

1

0

ringbond → digit

aromatic organic → ’c’

0

0

0

1

0

1

aliphatic organic → ‘C’

aliphatic organic → ‘N’

1

0

0

0

1

0

digit → ‘1’

digit → ‘2’

7 - Our results show that ORGAN is able to tune the quality

1992]. Let R(Y1:T ) be the reward function deﬁned for full

and structure of samples. We compare our results with the

length sequences. Given an incomplete sequence Y1:t , also to

maximum likelihood estimation (MLE), SeqGAN and a RL

be referred to as state st , G✓ must produce an action a, along

approach.

with the next token yt + 1.

Theagent’sstochastic policy isgiven by G✓(yt |Y1:t − 1) and

2

Related wor k

we wish to maximize its expected long term reward

Previous work has relied on speciﬁc modiﬁcations of the ob-

X

jective function to reach the desired properties. For exam-

J (✓) = E [R(Y1:T )|s0, ✓] =

G✓(y1|s0) ·Q(s0, y1) (2)

ple, [Jaques et al., 2016] introduce penalties to unrealistic se-

y1 2 Y

quences, in absence of which RL can easily get stuck around

local maxima which can be very far from the global maxi-

where s0 is a ﬁxed initial state. Q(s, a) is the action-value

mum reward. Related applications by [Ranzato et al., 2015]

function that represents the expected reward at state s of tak-

and [Li et al., 2016] apply reinforcement learning to sequence

ing action a and following our current policy G✓ to complete

generation in a NLP setting.

the rest of the sequence. For any full sequence Y1:T , we have

In the last two years, many methodologies have been pro-

Q(s = Y1:T − 1, a = yT ) = R(Y1:T ) but we also wish to

posed for de novo molecular generation. [Ertl et al., 2017]

calculate Q for partial sequences at intermediate timesteps,

and [Segler et al., 2017] trained recurrent neural networks

considering the expected future reward when the sequence

to generate drug-like molecules. [Gómez-Bombarelli et al.,

is completed. In order to do so, we perform N -time Monte

2016b] employed a v関連研究（

ariational autoencoder to b SMILES

uild a la-

Carlo search with the canonical rollout policy G✓ represented

tent, continuous space where property optimization can be

as

-based）

made through surrogate optimizati on. Finally, [Kadurin et

MCG✓(Y1:t ; N ) = { Y 1 , ..., Y N }

(3)

1:T

1:T

al., 2017] presented a GAN model for drug generation. Ad-

where Y n = Y

is stochastically sampled via

• テキストベースの生成モデルを使って

1:t and Y n

ditionally, the approach presented in this paper has recent SMIL

ly

the ES

policyを生成するアプローチ

1:t

t + 1:T

G✓. Now Q(s, a) becomes

been applied to molecular design [Sanchez-Lengeling et al.,

– Automati

2017]. c chemical design using a data-driven continuo

8

P

1

us representation of

In the ﬁeld of music generation, [Lee et al., 2017] built

>

R(Y n ), with

molecul

< N

1:T

a Sees

qG

A [Góme

N model e z

m -

pl Bo

oyin mbarell

g an efﬁcient i+,

rep 2

rese 01

ntati 8]

on of

n = 1..N

Q(Y

(4)

multi-channel MIDI to generate polyphonic music. [Chen

1:t − 1, yt ) = > Y n 2 MCG✓(Y

:

1:T

1:t ; N ),

if t < T .

• 入力

et SMIL

al.,

ES

2017] p を

res A

en ut

ted o

F -

u En

sion coder

GAN, , V

a du AE

al-leで再構築するように学習することで、潜在空間を学習

arning GAN

R(Y1:T ),

if t = T .

model that can fuse two data distributions. [Jaques et al.,

• ベイズ最適化で目的変数を最適化

2017] employ deep Q-learning with a cross-entropy reward

An unbiased estimation of the gradient of J (✓) can be de-

to optimize the quality of melodies generated from an RNN.

rived as

– ORGAN

In [Gui

advers mar

arial tra aes+

ining, [Pf,

a 20

u and17

Vin ]

yals, 2016] recontex-

tualizes GANs in the actor-critic setting. This connection

X

• RNND

1

is als ecode

o explore r

d による

with the SMIL

Wasserst ES

ein の文字列生成を

-1 distance in WGANs GAN+RL

r ✓J ( で最適化

✓) '

Eyt ⇠G✓(yt |Y1:t − 1) [

[Arjovsky et al., 2017]. Minibatch discrimination and feature

T

• SeqGAN

t = 1,...,T

mapping [Y

we u+

re use ,

d 2017

to prom ]

ote と同様、

diversity in Di

GA scrimin

Ns [Salima ator

ns

が評価したスコア平均を報酬に学習

r ✓log G✓(yt |Y1:t − 1) · Q(Y1:t − 1, yt )] (5)

• 任意のヒューリスティクス（

et al., 2016]. Another approach to avoDi

id ver

modesit

col yl等）から得たスコアも同時に最大化する

apse was

shown with Unrolled GANs [Metz et al., 2016]. Issues and

Finally in SeqGAN the reward function is provided by D φ.

convergence of GANs has been studied in [Mescheder et al.,

2017].

4

ORGAN

3

Backgr ound

In this section, weelaborate on theGAN and RL setting based

on SeqGAN [Yu et al., 2017]

G✓ is a generator parametrized by ✓, that is trained to pro-

duce high-quality sequences Y1:T = (y1, ..., yT ) of length

T and a discriminator model D φ parametrized by φ, trained

to classify real and generated sequences. G✓ is trained to

deceive D φ, and D φ to classify correctly. Both models are

Guimaraes+, 2017

trained in alternation, following a minimax game:

Figure 1: Schema for ORGAN. Left: D is trained as a classiﬁer

8

Gomez-Bombarelli+, 2018

receiving as input a mix of real data and generated data by G. Right:

G is trained by RL where the reward is a combination of D and the

min EY ⇠p

φ

data( Y ) [l og D ( Y )] + EY ⇠pG ✓ ( Y ) [log(1 − D (Y ))]

objectives, and ispassed back to the policy function via Monte Carlo

(1)

sampling. We penalize non-unique sequences.

For discrete data, thesampling process isnot differentiable.

However, G✓ can be trained as an agent in a reinforcement

Figure 1 illustrates the main idea of ORGAN. To take into

learning context using theREINFORCE algorithm [Williams,

account domain-speciﬁc desired objectives Oi , we extend the

2 - 関連研究（Graph-based）

• グラフベースの生成モデルを使って分子グラフを直接生成するアプローチ

– Learning deep generative models of graphs [Li+, 2018]

• ノード・結合を順々に自己回帰的に生成する

• 生成途中のグラフに対してGraph Convolutionで特徴抽出を行い、その結果を用いて次に生成する

ノード・結合を決める

– Junction Tree Variational Autoencoder for Molecular Graph Generation [Jin+, 2018]

• 環などの原子団を一つのグループにまとめることにより、

Junction Tr ee Var iational Autoencoder for M olecular Gr aph Gener ation

グラフ構造を木構造に変換する（Tree decomposition）

• 木構造をVAEの枠組みで再構築するように学習する

Figure 2. Comparison of two graph generation schemes: Structure

• Graph Convolutionで特徴抽出した結果も使って木構造から

by structure approach is preferred as it avoids invalid intermediate

states (marked in red) encountered in node by node approach.

グラフへと戻す

ond phase, the subgraphs (nodes in the tree) are assembled

together into a coherent molecular graph.

We evaluate our model on multiple tasks ranging from

molecular generation to optimization of a given molecule

according to desired properties. As baselines, we utilize

state-of-the-art SMILES-based generation approaches (Kus-

ner et al., 2017; Dai et al., 2018). We demonstrate that

our model produces 100% valid molecules when sampled

from a prior distribution, outperforming the top perform-

ing baseline by a signiﬁcant margin. In addition, we show

that our model excels in discovering molecules with desired

properties, yielding a 30% relative gain over the baselines.

2. Junction Tr ee Var iational Autoencoder

Figure 3. Overview of our method: A molecular graph G is ﬁrst

Jin+, 2018

decomposed into its junction tree TG , where each colored node in

Li+, 2018

9

Our approach extends the variational autoencoder (Kingma

the tree represents a substructure in the molecule. We then encode

& Welling, 2013) to molecular graphs by introducing a suit-

both the tree and graph into their latent embeddings zT and zG .

able encoder and a matching decoder. Deviating from pre-

To decode the molecule, we ﬁrst reconstruct junction tree from zT ,

vious work (G´omez-Bombarelli et al., 2016; Kusner et al.,

and then assemble nodes in the tree back to the original molecule.

2017), we interpret each molecule as having been built from

molecule. However, our clusters are built on the basis of the

subgraphs chosen out of a vocabulary of valid components.

molecules in the training set to ensure that a corresponding

These components are used as building blocks both when

junction tree can be found. Empirically, our clusters cover

encoding a molecule into a vector representation as well

most of the molecules in the test set.

as when decoding latent vectors back into valid molecular

graphs. The key advantage of this view is that the decoder

The original molecular graph and its associated junction tree

can realize a valid molecule piece by piece by utilizing the

offer two complementary representations of a molecule. We

collection of valid components and how they interact, rather

therefore encode the molecule into a two-part latent repre-

than trying to build the molecule atom by atom through

sentation z = [zT , zG ] where zT encodes the tree structure

chemically invalid intermediaries (Figure 2). An aromatic

and what the clusters are in the tree without fully captur-

bond, for example, is chemically invalid on its own unless

ing how exactly the clusters are mutually connected. zG

the entire aromatic ring is present. It would be therefore

encodes the graph to capture the ﬁne-grained connectivity.

challenging to learn to build rings atom by atom rather than

Both parts are created by tree and graph encoders q(zT |T )

by introducing rings as part of the basic vocabulary.

and q(zG |G). The latent representation is then decoded

back into a molecular graph in two stages. As illustrated in

Our vocabulary of components, such as rings, bonds and

Figure 3, we ﬁrst reproduce the junction tree using a tree

individual atoms, is chosen to be large enough so that a

decoder p(T |z

given molecule can be covered by overlapping components

T ) based on the information in zT . Second,

we predict the ﬁne grain connectivity between the clusters

or clusters of atoms. The clusters serve the role analogous to

in the junction tree using a graph decoder p(G|T , z

cliques in graphical models, as they are expressive enough

G ) to

realize the full molecular graph. The junction tree approach

that a molecule can be covered by overlapping clusters with-

allows us to maintain chemical feasibility during generation.

out forming cluster cycles. In this sense, the clusters serve

as cliques in a (non-optimal) triangulation of the molecular

Notation A molecular graph is deﬁned as G = (V, E )

graph. We form a junction tree of such clusters and use it

where V is the set of atoms (vertices) and E the set of bonds

as the tree representation of the molecule. Since our choice

(edges). Let N (x) be the neighbor of x. We denote sigmoid

of cliques is constrained a priori, we cannot guarantee that

function as σ(·) and ReLU function as ⌧(·). We use i , j , k

a junction tree exists with such clusters for an arbitrary

for nodes in the tree and u, v, w for nodes in the graph. - アウトライン

• 背景

• 関連研究

• 提案手法

• 実験・結果

10 - and policy gradient respectively. Guimaraes et al. [26] and Sanchez-Lengeling et al. [33] further

utilized an adversarial loss to the reinforcement learning reward to enforce similarity to a given

molecule dataset. In contrast, instead of using a text-based molecular representation, our approach

uses a graph-based molecular representation, which leads to many important beneﬁts as discussed

in the introduction. Jin et al. [15] proposed to use a variational autoencoder (VAE) framework,

where the molecules are represented as junction trees of small clusters of atoms. This approach can

only indirectly optimize molecular properties in the learned latent embedding space before decoding

to a molecule, whereas our approach can directly optimize molecular properties of the molecular

graphs. You et al. [41] used an auto-regressive model to maximize the likelihood of the graph

generation process, but it cannot be used to generate attributed graphs. Li et al. [24] and Li et al.

[25] described sequential graph generation models where conditioning labels can be incorporated

to generate molecules whose molecular properties are close to speciﬁed target scores. However,

these approaches are also unable to directly perform optimization on desired molecular properties.

Overall, modeling the goal-directed graph generation task in a reinforcement learning framework is

still largely unexplored.

3

Pr oposed M ethod

In this section we formulate the Gr

proble aph

m of graph ge Gen

neration as l er

earni ation

ng an RL agent thas M

at iteratively DP

adds substructures and edges to the molecular graph in a chemistry-aware environment. We describe

the problem deﬁnition, the environment design, and the Graph Convolutional Policy Network that

• 反復的なグラフ生成のプロセスを

predicts a distribution of actions which are used to update t MDP

he graph で定式化

being generated.

– 状態: 𝑆 = {𝑠

3.1

Pr oblem Deﬁni𝑡}

tion

• エージェントが観測する、時刻 𝑡 での中間的なグラフ

We represent a graph G as (A, E , F ), where A 2 { 0, 1} n ⇥n is the adjacency matrix, and F 2 Rn ⇥d

–is 行動

the

:

node fA

eat =

ure {𝑎

matrix assuming each node has d features. We deﬁne E 2 { 0, 1} b⇥n ⇥n to be the

𝑡}

(discrete) edge-conditioned adjacency tensor, assuming there are b possible edge types. Ei ,j ,k = 1 if

P

the •

re e 各時刻で現在のグラフに対する修正を記述する行動の集合（ノード・結合の追加など）

xists an edge of type i between nodes j and k, and A =

b

E

i = 1

i . Our primary objective is

to generate graphs that maximize a given property function S(G) 2 R, i.e., maximize E

– 状態遷移:

G 0[S(G0)],

where G0 is the ge P

ner =

ated 𝑝

g (

r 𝑠

ap 𝑡

h +

, 1

an |

d𝑠𝑡

S ,c …

oul ,d 𝑠0

be , 𝑎𝑡

one )

or multiple domain-speciﬁc statistics of interest.

It is •als 𝑠

o of practic において行動

al importance to constraiを取った時の状態遷移確率

n our model with two main sources of prior knowledge.

𝑡, … , 𝑠0

𝑎𝑡

(1) Generated graphs need to satisfy a set of hard constraints. (2) We provide the model with a set of

–e 報酬

xample :

g R

raph =

s G {𝑠

⇠ p𝑡}data(G), and would liketo incorporate such prior knowledgeby regularizing

the property optimization objective with EG,G0[J (G, G0)] under distance metric J (·, ·). In the case of

mol •

ecul 状態

e gene 𝑠

rati 到達時に得られる報酬関数

on, the set of hard constraints is described by chemical valency while the distance

𝑡

metric is an adversarially trained discriminator.

11

Figure 1: An overview of the proposed iterative graph generation method. Each row corresponds to

one step in the generation process. (a) The state is deﬁned as the intermediate graph Gt , and the set

of scaffold subgraphs deﬁned as C is appended for GCPN calculation. (b) GCPN conducts message

passing to encode the state as node embeddings then produce a policy ⇡ ✓. (c) An action at with 4

components is sampled from the policy. (d) The environment performs a chemical valency check on

the intermediate state, and then returns (e) the next state Gt + 1 and (f) the associated reward r t .

3 - and policy gradient respectively. Guimaraes et al. [26] and Sanchez-Lengeling et al. [33] further

utilized an adversarial loss to the reinforcement learning reward to enforce similarity to a given

molecule dataset. In contrast, instead of using a text-based molecular representation, our approach

uses a graph-based molecular representation, which leads to many important beneﬁts as discussed

in the introduction. Jin et al. [15] proposed to use a variational autoencoder (VAE) framework,

where the molecules are represented as junction trees of small clusters of atoms. This approach can

only indirectly optimize molecular properties in the learned latent embedding space before decoding

to a molecule, whereas our approach can directly optimize molecular properties of the molecular

graphs. You et al. [41] used an auto-regressive model to maximize the likelihood of the graph

generation process, but it cannot be used to generate attributed graphs. Li et al. [24] and Li et al.

[25] described sequential graph generation models where conditioning labels can be incorporated

to generate molecules whose molecular properties are close to speciﬁed target scores. However,

these approaches are also unable to directly perform optimization on desired molecular properties.

Overall, modeling the goal-directed graph generation task in a reinforcement learning framework is

still largely unexplored.

3

Pr oposed M ethod

In thi Gr

s sectioaph

n we formul Con

ate the problv

emolution

of graph generation al

as lear P

ningoli

an RL cy Network

agent that iteratively

(GCPN)

adds substructures and edges to the molecular graph in a chemistry-aware environment. We describe

the problem deﬁnition, the environment design, and the Graph Convolutional Policy Network that

• Grpaph

redicts

a con

distrib v

uti olu

on of ation

ctions

w による生成済みグラフ

hich are used to update the graph being gen 𝐺

erated と候補構造

𝑡 .

𝐶 の特徴抽出

– 候補構造（Scaffold）

3.1

Pr oblem Deﬁnition

• 生成済みのグラフに対して、新たに追加される部分グラフの候補

We represent a graph G as (A, E , F ), where A 2 { 0, 1} n ⇥n is the adjacency matrix, and F 2 Rn ⇥d

is th •

e

いくつかの原子からなる集合も考えられるが、本研究では単一の原子のみを想定

node feature matrix assuming each node has d features. We deﬁne E 2 { 0, 1} b⇥n ⇥n to be the

(discrete) edge-conditioned adjacency tensor, assuming there are b possible edge types. Ei ,j ,k = 1 if

–

P

th 拡張グラフ

ere exists an edge 𝐺

of

b

𝑡ty

pe i 𝐶

bet に対し、

ween nodes j G

anCN

d k, の一種

and A =

[Kipf

E

+, 2017] を拡張したモデルを使って特徴抽出

i = 1

i . Our primary objective is

to generate graphs that maximize a given property function S(G) 2 R, i.e., maximize E

• Kipf+ の手法を結合が考慮できるように拡張

G 0[S(G0)],

where G0 is the generated graph, and S could be one or multiple domain-speciﬁc statistics of interest.

It is also of–practical importance to constrain our model with two main sources of prior knowledge.

(1) Generated graphs need to satisfy a set of hard constraints. (2) We provide the model with a set of

example g –

rap 𝑙

hs 層目のノード埋め込み

G ⇠ pdat a(G), and would

𝐻(𝑙)

like to i を結合の種類毎に定義した重み

ncorporate such prior knowledge by reg 𝑊(𝑙)

ularizi を使って畳み込む

ng

𝑖

the property optimization objective with E

G, G0)] under distance metric J (·, ·). In the case of

molecule g– 非線形変換などを行った後、

G ,G 0[J ( AGG処理で各結合の種類に関して統合した結果を

eneration, the set of hard constraints is described by chemical valency while the distance

𝐻(𝑙+1) とする

metric is a –

n adversarially trained discriminator.

𝐸 : 結合に関する次元を追加した隣接テンソル

𝑖

𝐸 の 𝑖 番目の slice、

𝐸𝑖 = 𝐸𝑖 + 𝐼、

𝐷𝑖 = 𝑘 𝐸𝑖𝑗𝑘

12

Figure 1: An overview of the proposed iterative graph generation method. Each row corresponds to

one step in the generation process. (a) The state is deﬁned as the intermediate graph Gt , and the set

of scaffold subgraphs deﬁned as C is appended for GCPN calculation. (b) GCPN conducts message

passing to encode the state as node embeddings then produce a policy ⇡ ✓. (c) An action at with 4

components is sampled from the policy. (d) The environment performs a chemical valency check on

the intermediate state, and then returns (e) the next state Gt + 1 and (f) the associated reward r t .

3 - and policy gradient respectively. Guimaraes et al. [26] and Sanchez-Lengeling et al. [33] further

utilized an adversarial loss to the reinforcement learning reward to enforce similarity to a given

molecule dataset. In contrast, instead of using a text-based molecular representation, our approach

uses a graph-based molecular representation, which leads to many important beneﬁts as discussed

in the introduction. Jin et al. [15] proposed to use a variational autoencoder (VAE) framework,

where the molecules are represented as junction trees of small clusters of atoms. This approach can

only indirectly optimize molecular properties in the learned latent embedding space before decoding

to a molecule, whereas our approach can directly optimize molecular properties of the molecular

graphs. You et al. [41] used an auto-regressive model to maximize the likelihood of the graph

generation process, but it cannot be used to generate attributed graphs. Li et al. [24] and Li et al.

[25] described sequential graph generation models where conditioning labels can be incorporated

to generate molecules whose molecular properties are close to speciﬁed target scores. However,

these approaches are also unable to directly perform optimization on desired molecular properties.

Overall, modeling the goal-directed graph generation task in a reinforcement learning framework is

still largely unexplored.

3

Pr oposed M ethod

In thi Gr

s sectioaph

n we formul Con

ate the problv

emolution

of graph generation al

as lear P

ningoli

an RL cy Network

agent that iteratively

(GCPN)

adds substructures and edges to the molecular graph in a chemistry-aware environment. We describe

the problem deﬁnition, the environment design, and the Graph Convolutional Policy Network that

• 行動の予測

predicts a distribution of actions which are used to update the graph being generated.

– グラフにおけるリンク予測の要領で、

3.1

Pr oblem Deﬁnition

𝑎𝑡+1 = 𝑐𝑜𝑛𝑐𝑎𝑡(𝑎𝑓𝑖𝑟𝑠𝑡, 𝑎𝑠𝑒𝑐𝑜𝑛𝑑, 𝑎𝑒𝑑𝑔𝑒, 𝑎𝑠𝑡𝑜𝑝) を推定する

• 前項で計算したノード埋め込みベクトルを使ってどのノードを最初に選択するか決める

We represent a graph G as (A, E , F ), where A 2 { 0, 1} n ⇥n is the adjacency matrix, and F 2 Rn ⇥d

is the node–fe 𝑓

ature matrix assuming each node has d features. We deﬁne E 2 { 0, 1} b⇥n ⇥n to be :

the

𝑓𝑖𝑟𝑠𝑡(𝑠𝑡) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑚𝑓(𝑋)), 𝑎𝑓𝑖𝑟𝑠𝑡 ~𝑓𝑓𝑖𝑟𝑠𝑡 𝑠𝑡

∈ {0, 1}𝑛

（𝑚𝑓 ℝ𝑛×𝑘 → ℝ𝑛 へ写像するMLP）

(discrete) edge-conditioned adjacency tensor, assuming there are b possible edge types. Ei ,j ,k = 1 if

P

the •

re e 最初に選択されたノードに関する情報も使ってどのノードを

xists an edge of type i between nodes j and k, and A =

b

E

2番目に選択するか決める

i = 1

i . Our primary objective is

to generate graphs that maximize a given property function S(G) 2 R, i.e., maximize E

– 𝑓

G 0[S(G0)],

where G0 is the𝑠𝑒

g 𝑐𝑜𝑛𝑑(𝑠

enerate 𝑡) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑚

d graph, and S could 𝑠(𝑋

be 𝑎

, 𝑋)), 𝑎

𝑓𝑖𝑟

one 𝑠𝑡

or multiple do 𝑠𝑒

ma𝑐𝑜𝑛𝑑 ~𝑓

in-speciﬁ𝑠𝑒

c 𝑐𝑜𝑛𝑑 𝑠

statistic𝑡 ∈ {0, 1}𝑛+𝑐

s of interest.

It is •

als 選択された

o of practical i 2

mp つのノードの情報を使って結合の種類を決める

ortance to constrain our model with two main sources of prior knowledge.

(1) Generated graphs need to satisfy a set of hard constraints. (2) We provide the model with a set of

example g –

rap 𝑓

h 𝑒

s 𝑑𝑔𝑒

G (

⇠ 𝑠𝑡

p )

dat =

a( 𝑠

G 𝑜𝑓𝑡

), a 𝑚

nd 𝑎𝑥

wo (

u𝑚

ld 𝑒(𝑋

like𝑎to in ,

c 𝑋

orporate s)

u)

c,

h

pri

o𝑎

r knowledge by regularizing

𝑓𝑖𝑟𝑠𝑡

𝑎𝑠𝑒𝑐𝑜𝑛𝑑

𝑒𝑑𝑔𝑒 ~𝑓𝑒𝑑𝑔𝑒 𝑠𝑡

∈ {0, 1}𝑏

the property optimization objective with E

• 現在のグラフ全体の情報を使って生成プロセスを終了させるか決める

G ,G 0[J (G, G0)] under distance metric J (·, ·). In the case of

molecule generation, the set of hard constraints is described by chemical valency while the distance

metric is an adversarially trained discriminator.

– 𝑓𝑠𝑡𝑜𝑝(𝑠𝑡) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑚𝑡(𝐴𝐺𝐺 𝑋 )), 𝑎𝑠𝑡𝑜𝑝 ~𝑓𝑠𝑡𝑜𝑝 𝑠𝑡 ∈ {0, 1}

13

Figure 1: An overview of the proposed iterative graph generation method. Each row corresponds to

one step in the generation process. (a) The state is deﬁned as the intermediate graph Gt , and the set

of scaffold subgraphs deﬁned as C is appended for GCPN calculation. (b) GCPN conducts message

passing to encode the state as node embeddings then produce a policy ⇡ ✓. (c) An action at with 4

components is sampled from the policy. (d) The environment performs a chemical valency check on

the intermediate state, and then returns (e) the next state Gt + 1 and (f) the associated reward r t .

3 - and policy gradient respectively. Guimaraes et al. [26] and Sanchez-Lengeling et al. [33] further

utilized an adversarial loss to the reinforcement learning reward to enforce similarity to a given

molecule dataset. In contrast, instead of using a text-based molecular representation, our approach

uses a graph-based molecular representation, which leads to many important beneﬁts as discussed

in the introduction. Jin et al. [15] proposed to use a variational autoencoder (VAE) framework,

where the molecules are represented as junction trees of small clusters of atoms. This approach can

only indirectly optimize molecular properties in the learned latent embedding space before decoding

to a molecule, whereas our approach can directly optimize molecular properties of the molecular

graphs. You et al. [41] used an auto-regressive model to maximize the likelihood of the graph

generation process, but it cannot be used to generate attributed graphs. Li et al. [24] and Li et al.

[25] described sequential graph generation models where conditioning labels can be incorporated

to generate molecules whose molecular properties are close to speciﬁed target scores. However,

these approaches are also unable to directly perform optimization on desired molecular properties.

Overall, modeling the goal-directed graph generation task in a reinforcement learning framework is

still largely unexplored.

3

Pr oposed M ethod

In this section we formulate the problem of graph 状態遷移

generation as learning an /

RL 報酬

agent that iteratively

adds substructures and edges to the molecular graph in a chemistry-aware environment. We describe

the problem deﬁnition, the environment design, and the Graph Convolutional Policy Network that

• 状態遷移

predicts a distribution of actions which are used to update the graph being generated.

– 生成器が提案したノード / エッジが追加された分子に対して原子価チェックを行い、

3.1

Pr oblem Deﬁnition

その時点でin-validだった場合は状態を更新せず再度行動のサンプリングを行う

We represent a graph G as (A, E , F ), where A 2 { 0, 1} n ⇥n is the adjacency matrix, and F 2 Rn ⇥d

• 報酬

is the node feature matrix assuming each node has d features. We deﬁne E 2 { 0, 1} b⇥n ⇥n to be the

(discrete) edge-conditioned adjacency tensor, assuming there are b possible edge types. Ei ,j ,k = 1 if

–

P

th St

ere ep

exist re

s an w

edard

ge of type i between nodes j and k, and A =

b

E

i = 1

i . Our primary objective is

to generate graphs that maximize a given property function S(G) 2 R, i.e., maximize EG0[S(G0)],

wher•e 原子価ルールに違反したかどうか

G0 is the generated graph, and S could be one or + A

multipldv

e d er

om sarial re

ain-speciﬁc w

sta ard:

tistics

of𝑉i(

n π

terest.

θ, 𝐷φ)

It is •als Adv

o of p er

ractsarial re

ical importaw

nc ard

e to を算出する

constrain our m Di

odelscrimin

with two ator

main sは一般的な

ources of prior GAN

knowl フレームワークに従って学習する

edge.

(1) Generated graphs need to satisfy a set of hard constraints. (2) We provide the model with a set of

example g –

raphs G ⇠ pdat a(G), and would like to incorporate such prior knowledge by regularizing

the property optimization objective with EG,G0[J (G, G0)] under distance metric J (·, ·). In the case of

–m Fi

olenal

cule rew

generati ard

on, the set of hard constraints is described by chemical valency while the distance

metric is an adversarially trained discriminator.

• ドメイン固有の報酬（LogP, QED, 分子量等の組み合わせ）+ Adversarial reward: 𝑉(πθ, 𝐷φ)

14

Figure 1: An overview of the proposed iterative graph generation method. Each row corresponds to

one step in the generation process. (a) The state is deﬁned as the intermediate graph Gt , and the set

of scaffold subgraphs deﬁned as C is appended for GCPN calculation. (b) GCPN conducts message

passing to encode the state as node embeddings then produce a policy ⇡ ✓. (c) An action at with 4

components is sampled from the policy. (d) The environment performs a chemical valency check on

the intermediate state, and then returns (e) the next state Gt + 1 and (f) the associated reward r t .

3 - and policy gradient respectively. Guimaraes et al. [26] and Sanchez-Lengeling et al. [33] further

utilized an adversarial loss to the reinforcement learning reward to enforce similarity to a given

molecule dataset. In contrast, instead of using a text-based molecular representation, our approach

uses a graph-based molecular representation, which leads to many important beneﬁts as discussed

in the introduction. Jin et al. [15] proposed to use a variational autoencoder (VAE) framework,

where the molecules are represented as junction trees of small clusters of atoms. This approach can

only indirectly optimize molecular properties in the learned latent embedding space before decoding

to a molecule, whereas our approach can directly optimize molecular properties of the molecular

graphs. You et al. [41] used an auto-regressive model to maximize the likelihood of the graph

generation process, but it cannot be used to generate attributed graphs. Li et al. [24] and Li et al.

[25] described sequential graph generation models where conditioning labels can be incorporated

to generate molecules whose molecular properties are close to speciﬁed target scores. However,

these approaches are also unable to directly perform optimization on desired molecular properties.

Overall, modeling the goal-directed graph generation task in a reinforcement learning framework is

still largely unexplored.

3

Pr oposed M ethod

In this section we fo 方策勾配ベースの手法に

rmulate the problem of graph generation as learning an RL ag よる方

ent that iterativel 策の学

y

習

adds substructures and edges to the molecular graph in a chemistry-aware environment. We describe

the problem deﬁnition, the environment design, and the Graph Convolutional Policy Network that

• Pro

pre ximal P

dicts a distribu olicy Optim

tion of actions which a izatio

re used to n (PP

update the O)

graph [Schu

being generalman

ted.

+, 2017] により方策を学習

– 通常の方策勾配法:

3.1

Pr oblem Deﬁnition

• 𝐿𝑃𝐺(θ) =

𝔼

We represent a graph

𝑡 log π

G as (A, θ(𝑎

E , F )𝑡|𝑠

,

𝑡)

𝐴

where 𝑡

A 2 { 0, 1} n ⇥n is the adjacency matrix, and F 2 Rn ⇥d

–is Con

the

se

node f r

e v

at ati

ure v

ma e P

trix a ol

ssu icy

ming It

ea er

ch ati

node on

has d (C

fea P

tu I)

res.: 過去の方策との差分に注

We deﬁne E 2 { 0, 1} b⇥n ⇥n to be the

目

(discrete) edge-conditioned adjacency tensor, assuming there are b possible edge types. Ei ,j ,k = 1 if

P

there exists an edge of type i between nodes j and k, and A =

b

E

i = 1

i . Our primary objective is

to g •

enerate graphs that max π

𝐿𝐶𝑃𝐼 θ =

𝔼

i θ

mi(𝑎

z 𝑡

e |𝑠

a 𝑡)

give

n property function S(G) 2 R, i.e., maximize EG0[S(G0)],

𝑡

𝐴

where G0 is the generated π

𝑡 =

𝔼𝑡 𝑟𝑡(θ)

𝐴𝑡

gr θ

aph, (

a𝑎

nd S could be one or multiple domain-speciﬁc statistics of interest.

𝑜𝑙𝑑

𝑡|𝑠𝑡)

–It P

is ro

als ximal

o of pract P

icalol

i icy

mport Opti

ance to mi

con z

str ati

ain on

our m(P

odeP

l O)

with :

tw 方策の更新幅に制限を加

o main sources of prior knowledge.

えて学習を安定化させる

(1) Generated graphs need to satisfy a set of hard constraints. (2) We provide the model with a set of

exa •

mple graphs G ⇠ p

𝐿𝐶𝐿𝐼𝑃(θ) =

𝔼

dat a(G), and would like to incorporate such prior knowledge by regularizing

the property optimization 𝑡o mi

bjecn

ti (

v 𝑟

e 𝑡

wi 𝜃

th 𝐴

E

0

G𝑡 , clip(𝑟

,G 0[J (G, 𝑡

G θ

)] ,

u 1

nd −

er dε

i ,

s 1

tan +

ce ε

m)

e 𝐴

tri 𝑡

c )

J (·, ·). In the case of

molecule generation, the set of hard constraints is described by chemical valency while the distance

metric is an adversarially trained discriminator.

15

Figure 1: An overview of the proposed iterative graph generation method. Each row corresponds to

one step in the generation process. (a) The state is deﬁned as the intermediate graph Gt , and the set

of scaffold subgraphs deﬁned as C is appended for GCPN calculation. (b) GCPN conducts message

passing to encode the state as node embeddings then produce a policy ⇡ ✓. (c) An action at with 4

components is sampled from the policy. (d) The environment performs a chemical valency check on

the intermediate state, and then returns (e) the next state Gt + 1 and (f) the associated reward r t .

3 - アウトライン

• 背景

• 関連研究

• 提案手法

• 実験・結果

16 - 実験設定

• データセット

– ZINCからサンプリングした25万件の分子を使用

– 原子数の上限: 38、原子の種類: 9、結合の種類: 3

• GCPNの設定

– 3層64次元の中間層 + 各層の出力に対してBatch Normalizationを適用

– Aggregation functionにはSUMを採用

– RLの学習率は0.001、expert pretrainingについての学習率は0.00025

– Adam optimizer、バッチサイズ: 32

• ベースライン

– JT-VAEとORGANをベースラインに設定

17 - 実験1: 属性最適化

• 下記二種の属性値を最大化することを目的に実験を行った

– Penalized logP: ring sizeや合成可能性スコアも含めたLogP（疎水性）スコア

– QED: Drug-likenessを測る指標

• 一貫して既存法よりも優れた結果を達成

– LogP: JT-VAEと比較して約61%、ORGANと比較して約186%の改善

– Step-wiseの原子価チェックにより、in-validな分子は全く生成されなかった

• スコアが非常に高い、非現実的な分子を生成してしまう例が稀に見られた

– 下図2(a)右下の分子のように、Penalized logPは非常に良いが非現実的であるような、

スコア関数の欠陥をつくような生成例も存在した

Figure 2: Samples of generated molecules in property optimization and constrained property opti-

mization task. In (c), the two columns correspond to molecules before and after modiﬁcation.

18

Refer ences

[1] G. R. Bickerton, G. V. Paolini, J. Besnard, S. Muresan, and A. L. Hopkins. Quantifying the

chemical beauty of drugs. Nature chemistry, 4(2):90, 2012.

[2] K. H. Bleicher, H.-J. Böhm, K. Müller, and A. I. Alanine. Hit and lead generation: beyond

high-throughput screening. Nature Reviews Drug Discovery, 2:369–378, 2003.

[3] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba.

Openai gym. CoRR, abs/1606.01540, 2016.

[4] H. Dai, Y. Tian, B. Dai, S. Skiena, and L. Song. Syntax-directed variational autoencoder for

structured data. arXiv preprint arXiv:1802.08786, 2018.

[5] D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik,

and R. P. Adams. Convolutional networks on graphs for learning molecular ﬁngerprints. In

Advances in neural information processing systems, 2015.

[6] P. Ertl. Estimation of synthetic accessibility score of drug-like molecules. J. Cheminform, 2009.

[7] P. Ertl, R. Lewis, E. J. Martin, and V. Polyakov. In silico generation of novel, drug-like chemical

matter using the LSTM neural network. CoRR, abs/1712.07449, 2017.

[8] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl. Neural message passing for

quantum chemistry, 2017.

[9] R. Gómez-Bombarelli, J. N. Wei, D. Duvenaud, J. M. Hernández-Lobato, B. Sánchez-Lengeling,

D. Sheberla, J. Aguilera-Iparraguirre, T. D. Hirzel, R. P. Adams, and A. Aspuru-Guzik. Auto-

matic chemical design using a data-driven continuous representation of molecules. ACSCentral

Science, 2016.

[10] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and

Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems,

2014.

[11] W. Hamilton, Z. Ying, and J. Leskovec. Inductive representation learning on large graphs. In

Advances in Neural Information Processing Systems, 2017.

[12] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing

internal covariate shift. In F. Bach and D. Blei, editors, Proceedings of the 32nd International

Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research,

pages 448–456, Lille, France, 07–09 Jul 2015. PMLR.

[13] J. J. Irwin, T. Sterling, M. M. Mysinger, E. S. Bolstad, and R. G. Coleman. Zinc: a free tool to

discover chemistry for biology. Journal of chemical information and modeling, 52(7):1757–

1768, 2012.

[14] E. Jannik Bjerrum and R. Threlfall. Molecular Generation with Recurrent Neural Networks

(RNNs). arXiv preprint arXiv:1705.04612, 2017.

[15] W. Jin, R. Barzilay, and T. Jaakkola. Junction tree variational autoencoder for molecular graph

generation. arXiv preprint arXiv:1802.04364, 2018.

[16] S. Kakade and J. Langford. Approximately optimal approximate reinforcement learning. In

International Conference on Machine Learning, 2002.

9 - 実験2: 属性ターゲティング

• LogP, 分子量が特定の値域に収めることを目的に実験を行った

– スコアが範囲に収まっているかどうかに加えて、生成物の多様性も含めて評価を行った

– 多様性は生成物同士の全ペアに対するMorgan Fingerprintのタニモト距離平均で評価

• 値が大きいほど多様性が高い

• 値域の制御については一貫して既存法よりも優れた結果を達成

– 多様性については一部他手法より劣っているものの、致命的なものは無く、値域の制御と

多様性を両立できていると言える

19 - 実験3: 制約付き属性最適化

• 所与の分子との類似度とPenalized logPとの両立を目的に実験を行った

– 800個ピックアップしたZINC分子との類似度を最適化後、Penalized logPについて最適化

する

– JT-VAEについては目的関数による制御ができないため、類似度の閾値δでフィルタを行った

• 一貫して既存法よりも優れた結果を達成

– Penalized logPの改善幅については平均して148%の改善を達成

– 元の分子の部分構造を保ちながら、目的関数を最適化する新たな分子の生成に関して

一定水準の品質で成功した

20 - 結論

• Graph Convolutional Policy Network（GCPN）を提案し、分子設計に適用した

– 分子属性の最適化、ターゲティングなどのタスクにおいて、既存の手法を上回る性能を

示した

– 生成過程で原子価チェックが入るため、原子価に違反した分子が生まれることは無い

• GCPNは一般的な枠組みであり、分子生成以外の分野にも適用可能

– 電子回路やSNSなどの分野でも、ドメイン固有の目的関数を変更することで適用可能だと

考えられる

21 - References

• Text-based generative models

– Gómez-Bombarelli, Rafael, et al. "Automatic chemical design using a data-driven continuous representation

of molecules." ACS central science 4.2 (2018): 268-276.

– Guimaraes, Gabriel Lima, et al. "Objective-reinforced generative adversarial networks (ORGAN) for

sequence generation models." arXiv preprint arXiv:1705.10843 (2017).

– Yu, Lantao, et al. "SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient." AAAI. 2017.

• Graph-based generative models

– You, Jiaxuan, et al. "Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation."

NIPS (2018 to appear).

– Li, Yujia, et al. "Learning deep generative models of graphs." arXiv preprint arXiv:1803.03324 (2018).

– Jin, Wengong, Regina Barzilay, and Tommi Jaakkola. "Junction Tree Variational Autoencoder for Molecular

Graph Generation." ICML (2018).

– Kipf, Thomas N., and Max Welling. "Semi-supervised classification with graph convolutional networks."

ICLR (2017).

• Others

– Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017).

22