- 意味表現の学習

2014/06/05 PFI セミナー

能地 宏

総研大D2 / NII - 意味とはなにか？

‣ 真面目に考えるのは言語哲学の分野？

- ‐ 例えば：検証主義（クワイン）

- ‐ 文の意味とは、文の検証条件、つまり、文が真であることを示すこ

とにつながるような可能な経験の集合に他ならない。

‣ 意味とは何かは考えたくない（博士が取れない）

‣ 言語処理では、意味の計算によって次のような問題が解けるよう

になることを目指す

- ‐ 質問応答（QA）

- ‐ 含意関係認識

- ‐ 記事の分類（クラスタリング）、評判分析、etc

1 /37 - 今日の焦点：質問応答

The Big Picture

What is the most populous city in California?

Database

System

Liang et al.’11

Los Angeles

Expensive: logical forms

[Zelle &‣ 目標：質問文の意味を理解して、データベースから答えを探すよ

Mooney, 1996; Zettlemoyer & Collins, 2005]

[Wong & うなシ

Mooney, ステ

2007; ムを構築する

Kwiatkowski et al., 2010]

What is the most populous city in California?

) argmax(

Th x.

eycity

al (

o x

w ) ^

us loc

to (x,

te CA

mp )o,ra x.

rilpop.

y si (

d x

e ))

step intractable philosophical ques5ons

How many st

on at

h eos b

w o

t rod e

r r

e O

pr reegon

sent? meaning in general. Liang et al.’13

) count( x.state(x) ^ border(x, OR)

· · ·

2 /37

2 - 今日の範囲

‣ 質問応答の分野での、意味の表現についての議論

- ‐ 学習については多分あまり話しません

New: Dependency-Based Compositional Semantics (DCS)

- ‐ 主に二つの意味表現の紹介と、両者の最近の進展について

most populous city in California

ing can be directly incorporated into existing algo-

flights

from

Boston

city

rithms that learn CCG lexicons. When the original

1

1

N

(N\N)/NP

NP

algorithm would have added an entry l to the lexi-

lx. f light(x) lyl f lx. f (x) ^ f rom(x,y)

bos

1

1

con, we can instead compute the factoring of l and

>

population

loc

(N\N)

add the corresponding lexeme and template to the

l f lx. f (x) ^ f rom(x,bos)

2

c

<

factored lexicon.

N

1

lx. f light(x) ^ f rom(x,bos)

argmax

CA

6.2 Introducing Templates with Content

Given analyses of this form, we introduce new

Maximal factorings, as just described, provide for

templates that will

CCG

allow us to recover from miss-

DCS

significant lexical generalization but do not handle

ing words, for example if “from” was dropped. We

all of the cases needed to learn effectively. For

identify commonly occurring nodes in the best parse

instance, the maximal split for the item Boston ` trees found during training, in this case the non-

3 /37

N\N : l f.lx. f (x) ^ f rom(x,bos) would introduce terminal spanning “from Boston,” and introduce

the lexeme (Boston,[ f rom,bos]), which is subopti-

templates that can produce the nonterminal, even if

4

mal since each possible city would need a lexeme

one of the words is missing. Here, this approach

of this type, with the additional from constant in-

would introduce the desired template l (w,~v).[w `

cluded. Instead, we would ideally like to learn the

N\N : l f.lx. f (x) ^ f rom(x,v1)] for mapping the

lexeme (Boston,[bos]) and have a template that in-

lexeme (Boston,[bos]) directly to the intermediate

troduces the from constant. This would model the

structure.

desired generalization with a single lexeme per city.

Not all templates introduced this way will model

In order to permit the introduction of extra con-

valid generalizations. However, we will incorporate

stants into lexical items, we allow the creation of

them into a learning algorithm with indicator fea-

templates that contain logical constants through par-

tures that can be weighted to control their use. The

tial factorings. For instance, the template below can

next section presents the complete approach.

introduce the predicate from

7 Learning Factored PCCGs

l

Our Factored Unification Based Learning (FUBL)

(w,~v).[w ` N\N : l f .l x. f (x) ^ f rom(x,v1)]

method extends the UBL algorithm (Kwiatkowski

et al., 2010) to induce factored lexicons, while also

The use of templates to introduce extra semantic

simultanously estimating the parameters of a log-

constants into a lexical item is similar to, but more

linear CCG parsing model. In this section, we first

general than, the English-specific type-shifting rules

review the NEW-LEX lexical induction procedure

used in Zettlemoyer and Collins (2007), which were

from UBL, and then present the FUBL algorithm.

introduced to model spontaneous, unedited text.

They are useful, as we will see, in learning to re-

7.1 Background: NEW-LEX

cover semantic content that is implied, but not ex-

NEW-LEX generates lexical items by splitting and

plicitly stated, such as our original motivating phrase

merging nodes in the best parse tree of each training

“flights Boston to New York.”

example. Each parse node has a CCG category X : h

To propose templates which introduce semantic

and a sequence of words w that it spans. We will

content, during learning, we build on the intuition

present an overview of the approach using the run-

that we need to recover from missing words, such

ning example with the phrase w =“in Boston” and

as in the example above. In this scenario, there

the category X : h = S\NP : lx.loc(x,bos), which is

should also be other sentences that actually include

of the type commonly seen during learning. The

the word, in our example this would be something

splitting procedure is a two step process that first

like “flights from Boston.” We will also assume

splits the logical form h, then splits the CCG syn-

that we have learned a good factored lexicon for the

tactic category X and finally splits the string w.

complete example that could produce the parse:

The first step enumerates all possible splits of

the logical form h into a pair of new expressions

1517 - The Big Picture

The Big Picture

What is the most populous city in California?

What is the most populous city in California?

Database

System

問題設定：意味表現の学習

Database

System

The Big Picture

Los Angeles

What is the most populous city in California? Los Angeles

Expensive: logical forms

Database

Expensive S

: ys

l te

ogim

自然言語を、コンピュータ

cal f[Z

o e

r lle

m &

s Mooney, 1996; Zettlemoyer & Collins, 2005]

[Zelle & Mooney, 1996; Ze [tW

tle の理解できる

on

m g

o &

yer Mo

& on

C e

oll y

i ,ns,意味表現

2007; K

2005] wia に変換

tkowski et al., 2010]

[Wong & Mooney, 2007; Kwiatkowski et al., 2010]

Los Ange

Nle

e s Wh論理式（プログラミング言語）

at is the most populous city in California?

w: Dependency-Based Compositional Semantics (DCS)

What is the most popul )

ous argmax

city in(Cx.

al city

iforn (

i x

a )

? ^ loc(x, CA), x.pop.(x))

most populous city in California

文

意味表現

答え

Expensive: logical forms ) argmax( x.city(x) How many states border Oregon?

^ loc(x, CA), x.pop.(x))

[Zelle & Mooney, 1996; Zettlemoyer &

Ho Col

w lin

m sa, 2005]

ny states border)

O count

regon (

? x.state(x) ^ border(x, OR)

曖昧性がある

決定的

その他の表現

[Wong & Mooney, 2007; Kwiatkowski e

) t al., 2010]

count( x.state(x) ·

^ · ·

border(x, OR)

city

難しい！

· · ·

簡単！

1

1

What is the most populous city in California?

1

1

) argmax( x.city(x) ^ loc(x, CA), x.pop.(x

文が与えられたとき、正解の ))

population

loc

意味表現

How many states border Oregon?

2

c

1

もしくは答えを求められれば良い

) count( x.state(x) ^ border(x, OR)

2

argmax

CA

· · ·

教師あり学習

2

4 /37

2

4 - 教師データの与え方

文

意味表現

答え

曖昧性がある

決定的

難しい！

簡単！

文

意味表現 のペア

文

答え のペア

How many states border Oregon?

How many states border Oregon?

count(λx.state(x) ∧ border(x, OR)

3

アノテートが高コスト

非専門家がアノテートできる

学習はより簡単

学習が難しい

5 /37 - 大きく二つの意味表現

CCG + 論理式系

DCS系

Ze#lemore & Collins’05

文と論理式の

Ze#lemore & Collins’07

Kwiatkovski et al.’10

ペアから学習

Kwiatkovski et al.’11

・・・

Kwiatkovski et al.’13

文と答えの

Liang et al.’11

QA以外

Artzi & Ze#lemore’11

Berant et al.’13

ペアから学習

Artzi & Ze#lemore’13

Berant and Liang’14

Matsuzek et al.’12

他に Tree Grammar 系もあるが省略

6 /37 - 大きく二つの意味表現

CCG + 論理式系

DCS系

Ze#lemore & Collins’05

文と論理式の

Ze#lemore & Collins’07

Kwiatkovski et al.’10

ペアから学習

Kwiatkovski et al.’11

・・・

Kwiatkovski et al.’13

文と答えの

Liang et al.’11

QA以外

Artzi & Ze#lemore’11

Berant et al.’13

ペアから学習

Artzi & Ze#lemore’13

Berant and Liang’14

Matsuzek et al.’12

他に Tree Grammar 系もあるが省略

6 /37 - 問題設定の確認

‣ 自然文から、論理式への変換を行う分類器を構築したい

- ‐ 機械翻訳に似ている？（そういう手法もある）

How many states border Oregon?

λg.count(λx.

state(x)∧g(x))

λx.border(x, OR)

count(λx.state(x) ∧ border(x, OR))

- ‐ 論理式は構造を持っていることが異なる

count(λx.state(x) ∧ border(x, OR))

- ‐ 関数の合成によって式を得たい

λg.count(λx.state(x)∧g(x))

- ‐ 文の構造に沿って論理式の計算が

できる枠組みが欲しい

λf.λg.count(λx.f(x)∧g(x))

λx.border(x, OR)

- ‐ そのための道具として CCG を

λx.state(x)

How many states border Oregon?

用いる

7 /37 - Combinatory Categorical Grammar

CCG = Combinatory rules + Categorical Grammar

文の構造を記述する文法理論の一種

依存文法 (Dependency Grammar)

範疇文法 (Categorical Grammar)

sbj

obj

John loves Mary

S

S\NP

文脈自由文法 (CFG)

S

NP S\NP/NP NP

John loves Mary

NP

VP

見た目は CFG と似ているが

John loves Mary

8 /37 - (CG)

•

•

S, NP, N

•

–

“/” “\”

– X/Y

Y

X

– X\Y

Y

X

•

–

S\NP

NP

S\NP

S

–

S\NP/NP

John

walked

宮尾祐介 (2012) 自然言語処理における 構文解析と言語理論の関係 より

9 /37 - 組み合わせ規則

‣ 少数の組み合わせ規則が存在する

S

- ‐ forward applicaXon (>)

S\NP

X/Y Y X

NP S\NP/NP NP

- ‐ backward applicaXon (<)

John loves Mary

Y X\Y X

X と Y にはどんなカテゴリも入る

文法が定めるのは、これらの少数のルールだけ

10 /37 - 組み合わせ規則

‣ 少数の組み合わせ規則が存在する

S

- ‐ forward applica-on (>)

S\NP

X/Y Y X

NP S\NP/NP NP

- ‐ backward applicaXon (<)

John loves Mary

Y X\Y X

X と Y にはどんなカテゴリも入る

文法が定めるのは、これらの少数のルールだけ

10 /37 - 組み合わせ規則

‣ 少数の組み合わせ規則が存在する

S

- ‐ forward applicaXon (>)

S\NP

X/Y Y X

NP S\NP/NP NP

- ‐ backward applica-on (<)

John loves Mary

Y X\Y X

X と Y にはどんなカテゴリも入る

文法が定めるのは、これらの少数のルールだけ

10 /37 - 組み合わせ規則

‣ 少数の組み合わせ規則が存在する

S

- ‐ forward applicaXon (>)

S\NP

X/Y Y X

NP S\NP/NP NP

- ‐ backward applicaXon (<)

John loves Mary

Y X\Y X

CCG の導出は証明の形で表されることが多い

loves Mary

John

S\NP/NP NP >

NP

S\NP <

S

10 /37 - 意味表現の計算

‣ CCGを用いることの利点：木構造に沿って意味の計算が行える

- ‐ 各単語には、カテゴリと共に、ラムダ式の形で意味表示が与えられる

- ‐ 各規則は、論理式の合成の仕方も規定する

forward applicaXon (>)

John ⊢ NP: john

is

X/Y Y X

⊢ S\NP/NP: λx.λy.love(y,x)

f g f(g)

Mary ⊢ NP: mary

loves Mary

backward applicaXon (<)

S\NP/NP NP

John λx.λy.love(y,x) mary

Y X\Y X

>

NP

S\NP

g f f(g)

john

λy.love(y,mary)

<

S

love(john,mary)

11 /37 - CCG- ‐based: Overview

ZeHlemore & Collins’05,’09

Kwiatkowski et al.’10,’11

教師データ：(文, 論理式) の集合

知っていること

returning entirely correct logical forms for each test sen-

a) What states border Texas

・CCG の合成規則

tence. Our method achieves over 95% precision on both of

x.state(x) ^ borders(x, texas)

these domains, with recall of 79% on each domain. These

Y: g X\Y: f X: f(g)

are highly competitive results when compared to the previ-

b) What is the largest state

ous work.

arg max( x.state(x), x.size(x))

X/Y: f Y/Z: g X/Z: λx.f(g(x))

c) What states border the state that borders the most states

2 Background

・・・

x.state(x) ^ borders(x, arg max( y.state(y),

y.count( z.state(z) ^ borders(y, z))))

・各単語のカテゴリの

This section gives background material underlying our

learning approach. We first describe the lambda–calculus

Figure 1: Examples of sentences with their logical forms.

expressions used to represent logical forms. We then de-

ゆるい候補

機械学習

scribe combinatory categorial grammar (CCG), and the ex-

tension of CCG to probabilistic CCGs (PCCGs) through

• Additional quantifiers:

The expressions involve

log-linear models.

the additional quantifying terms count, arg max,

arg min, and the definite operator ◆. An example

正解の木構造は与えられない

of a count expression is count( x.state(x)), which

2.1 Semantics

テスト（評価）

returns the number of entities for which state(x)

文の論理式だけをたよりに、

is true.

arg max expressions are of the form

The sentences in our training data are annotated with ex-

How many states border Oregon? ???

arg max( x.state(x), x.size(x)). The first argu-

モデルのパラメータを学習

pressions in a typed lambda–calculus language similar to

ment is a lambda expression denoting some set of en-

the one presented by Carpenter (1997). The system has

tities; the second argument is a function of type

three basic types: e, the type of entities; t, the type of truth

he, ri.

12 /37

In this case the arg max operator would return the set

values; and r, the type of real numbers. It also allows func-

of items for which state(x) is true, and for which

tional types, for example he, ti, which is the type assigned

size(x) takes its maximum value. arg min expres-

to functions that map from entities to truth values. In spe-

sions are defined analogously. Finally, the definite op-

cific domains, we will specify subtype hierarchies for e.

erator creates expressions such as ◆( x.state(x)). In

For example, in a geography domain we might distinguish

this case the argument is a lambda expression denot-

different entity subtypes such as cities, states, and rivers.

ing some set of entities. ◆( x.state(x)) would return

Figure 1 shows several sentences from the geography

the unique item for which state(x) is true, if a unique

(Geo880) domain, together with their associated logical

item exists. If no unique item exists, it causes a pre-

form. Each logical form is an expression from the lambda

supposition error.

calculus. The lambda–calculus expressions we use are

formed from the following items:

2.2 Combinatory Categorial Grammars

• Constants: Constants can either be entities, numbers

The parsing formalism underlying our approach is that of

or functions. For example, texas is an entity (i.e., it

combinatory categorial grammar (CCG) (Steedman, 1996,

is of type e). state is a function that maps entities to

2000). A CCG specifies one or more logical forms—of the

truth values, and is of type he, ti. size is a function

type described in the previous section—for each sentence

that maps entities to real numbers, and is therefore of

that can be parsed by the grammar.

type he, ri (in the geography domain, size(x) returns

the land-area of x).

The core of any CCG is a lexicon, ⇤. In a purely syntactic

version of CCG, the entries in ⇤ consist of a word (lexical

• Logical connectors: The lambda–calculus expres-

item) paired with a syntactic type. A simple example of a

sions include conjunction (^), disjunction (_), nega-

CCG lexicon is as follows:

tion (¬), and implication (!).

Utah

:= NP

• Quantification: The expressions include universal

Idaho

:= NP

quantification (8) and existential quantification (9).

borders

:= (S\NP )/NP

For example, 9x.state(x) ^ borders(x, texas) is true

if and only if there is at least one state that borders

In this lexicon Utah and Idaho have the syntactic type N P ,

Texas. Expressions involving 8 take a similar form.

and borders has the more complex type (S\NP )/NP . A

syntactic type can be either one of a number of primitive

• Lambda expressions: Lambda expressions represent

categories (in the example, N P or S), or it can be a com-

functions. For example,

x.borders(x, texas) is a

plex type of the form A/B or A\B where both A and B

function from entities to truth values, which is true of

can themselves be a primitive or complex type. The prim-

those states that border Texas.

itive categories N P and S stand for the linguistic notions - 正解の木構造は与えられない

a)

Utah

borders

Idaho

b)

What

states

border

Texas

a)

Utah

borders

Idaho

b)

b)

What

What

states

states

border

Texas

N P

N P(S\N (S

P )\N

/NPP)/NP NP NP

( ((S/

S/

S/( ((

S S

S

\ \

\NN

NPP

P ))

))

)) /N

/N

/N

N

N

N

(

( S

S(\N

N

S\ P

N)/N

P ) P

/N P

N PN P

utah

utah

x. y. x.

bor y.

de bor

rs( de

y, rs

x (

) y, x) idaho

idaho

f. f.

f.g. g.

g.

x.f

x.f

x.f ( (

(x x

x) )

) ^

^

^ gg

g((

( x

x

x )

))

x

x.

x ..s

sst

ttate

ate (

(

ate x

x

( )

)

x)

x.

x.x.y

y ..bor

y. de

bor rs

de(y,

rs x

( )

y, x)texastexas

>

>

>

>>

>

>

(S\N(S

P \

) NP )

S/

S/

S/( (

( S

S

S\\

\ N

N

N P

P

P )

))

(S\

( N

S P

\ )

N P )

y.bor y.b

der o

sr(der

y, s(y, idaho

idaho)

)

g.g.

g. x x

x...

s s

st t

t ate

ate((

( x

x )

) ^

^ g

g (

( x

x )

)

y

y .b

.byor

.b der

or s(y,

ders te

( xas

y, )

texas)

<

<

>

>

S

S

S

S

borders(utah, idaho)

x

borders(utah, idaho)

x..s

.s t

st ate

ate (

t

( x

ate x )

( )

x) ^ borders(x, texas)

^ borders(x, texas

^ borders(x, texas)

Figure 2: Two examples of CCG parses.

Figure 2: T

目的関数：

‣

w 一種の

o e

dis

xamples tan

of t sup

CCG ervisio

parses. n

- ‐ 木構造をアノテートする必要がない

of noun-phrase and sentence respectively. Note that a sin-

application rules are then extended as follows:

of

gle word

noun-phrase can

and have more

sentence than

r

one

especti syntactic

vely.

type,

Note

and

that a sin-

application rules are then extended as follows:

- ‐ 普通の構文解析より難しい

hence

(2) The functional application rules (with semantics):

gle w more

ord canthan

hav one

e

entry

more

in

thanthe le

one xicon.

syntactic type, and hence

(2) The functional application rules

学習：(with semantics):

more than one entry in the lexicon.

- ‐

a. A/B : f

B : g

文法獲得との関連？

)

A : f (g)

In addition to the lexicon, a CCG has a set of combinatory

b.

a. B : g

A/B : A

f \B :

B f:gL ))

atent A

V :

ari f

A (:

ablge)

f (Sg

tr)uctured

In

rules

addition which

to the describe

lexicon, ahow adjacent

CCG has a syntactic

set of

categories

combinatory in

b. B : g A\B : fPer )

ceptroA

n : f(g)

Rule 2(a) now specifies how the semantics of the category

rules

a string

which

can be

describe recursi

how vely combined.

adjacent

The

syntactic

simplest

categories such

in

A is

Rule composition

2(a) now

ally built

specifies hoout

w t of

he the semantics

semantics of for

the A/B

category

13 /37

rules are rules of functional application, defined as follows:

a string can be recursively combined. The simplest such

and B. Our derivations are then extended to include a com-

A is compositionally built out of the semantics for A/B

rules

(1)

are

The

rules functional

of

application

functional

rules:

application, defined as follows:

positional semantics. See Figure 2(a) for an example parse.

and B. Our derivations are then extended to include a com-

a. A/B

B

)

A

This parse shows that Utah borders Idaho has the syntactic

(1) The functional application rules:

positional semantics. See Figure 2(a) for an example parse.

b. B

A\B

)

A

type S and the semantics borders(utah, idaho).

a. A/B

B

)

A

This parse shows that Utah borders Idaho has the syntactic

Intuitively

b. , a

B cate

A gory of the form A/B denotes a string that

In

type spite

S

of

and their

the

relative

semantics simplicity

border , CCGs

s(utah,

can captu

idaho).

re a

\B

)

A

is of type A but is missing a string of type B to its right;

wide range of syntactic and semantic phenomena. As one

Intuitiv similarly

ely, a

,

cateA\B

gory denotes

of the a string

form

of

A/B type A that

denotes a is

stri missing

ng that a

e

In xample,

spite

see

of

Figure

their

2(b)

relative for a more comple

simplicity,

x

CCGs parse.

can Note

capture a

is of

string

type A of

b type

ut is B to its

missing left.

a string of type B to its right;

that

wide in this

range case

of we have an

syntactic

additional

and

primiti

semantic

ve category

phenomena. , N

As one

similarly, A\B denotes a string of type A that is missing a

e (for nouns),

xample, seeand the final

Figure

semantics

2(b) for a

is a

more lambda

comple e

xxpression

parse. Note

The first rule says that a string with type A/B can be com-

string of type B to its left.

denoting

that in thisthe set

case o

wef entities

have an that are sta

additional tes and

primiti tha

ve t border

category, N

bined with a right-adjacent string of type B to form a new

Texas. In this case, the lexical item what has a relatively

string of type A. As one example, in our lexicon, borders,

(for nouns), and the final semantics is a lambda expression

The first rule says that a string with type A/B can be com-

complex category, which leads to the correct analysis of

(which has the type (S\NP )/NP ) can be combined with denoting the set of entities that are states and that border

bined with a right-adjacent string of type B to form a new

the underlying string.

Idaho (which has the type N P ) to form the string borders

Texas. In this case, the lexical item what has a relatively

string of type A. As one example, in our lexicon, borders,

Idaho with type S\NP . The second rule is a symmetric

A full

comple description

x category, of CCG

which goes

leadsbeyond

to the the scope

correct of this

analysis of

(which has the type (S\NP )/NP ) can be combined with

rule applying to categories of the form A\B. We can use

paper

the

. There are

underlying

several

string.

extensions to the formalism: see

Idaho this to

(which com

has bine

the Utah

type (type

N P ) N

to P) with

form

bor

the ders

stringIdaho

bor (type

ders

(Steedman, 1996, 2000) for more details. In particular,

Idaho S\N

with P ) to

type form

S\N the

P . string

The Utah bor

second ders

rule Idaho

is a with the

symmetrtype

ic

ACCG

full includes rules

description of

of combination

CCG goes that

be go be

yond yon

the d the sim-

scope of this

rule

S. We

applying can

to dra

ca w

te a parse

gories tree

of

(or

the

deri

form vation)

A\B. of Utah

We

bor

can

der

use s

ple function

paper. Thereapplication

are sev

rules

eral e in 1(a) and

xtensions to 1(b).1

the

Additional

formalism: see

this to Idaho

com as

bine follows:

Utah (type N P ) with borders Idaho (type

combinatory

(Steedman, rules

1996, allow CCGs

2000) for to giv

moree an elegant

details.

treatment

In particular,

S

of linguistic

CCG

phenomena

includes rules of

such as coordination

combination that go and

be

relati

yond

v

thee

\NP ) to form the string Utah borders Idaho with the type

sim-

Utah

borders

Idaho

clauses.

ple

In

function our work we

application

mak

rules e

in use of

1(a)

standard

and 1(b).1 rules of

S. We can draw a parse tree (or derivation) of Utah borders

Additional

N P

(S

Idaho as follows:

\NP )/NP

N P

application, forward and backward composition, and type-

combinatory rules allow CCGs to give an elegant treatment

>

(

raising. In addition, we allow lexical entries consisting of

S\NP )

of linguistic phenomena such as coordination and relative

Utah

borders

Idaho

<

strings of length greater than one, for example

S

clauses. In our work we make use of standard rules of

N P

(S

the Mississippi

application, forw :=

ard andNP

bac : mi

kw ssi

ard ssippi river

\NP )/NP

N P

composition, and type-

Note that we use the notation > and <

>

to denote appli-

raising. In addition, we allow lexical entries consisting of

cation of rules 1(a) and 1(b)

(S\ res

N pecti

P )

vely.

This leads to a relatively minor change to the formalism, - 文法獲得との関連（余談）

教師なし構文解析

教師なし学習

Klein & Manning’04

Smith & Eisner’06

you have

Headden III et al.’09

another cookie

Mareček & Žabokrtský’11

you have another cookie

・・・

完全に生の文から、モデル

を推定する問題

‣ 二つのゴール：

- ‐ Scien-ﬁc: 赤ちゃんが言語を獲得する仕組みを明らかにする

- ‐ Engineering: 教師データのない言語の解析に役立つ

- ‐ しかし、赤ちゃんは言語以外の様々なシグナルを利用して文法を獲得する

（科学的目的のためには、設定があまり現実的でない）

14 /37 - A Probabilistic Model of Syntactic and Semantic Acquisition from

Child-Directed Utterances and their Meanings

Tom Kwiatkowski* †

Sharon Goldwater⇤

Luke Zettlemoyer†

Mark Steedman⇤

tomk@cs.washington.edu

sgwater@inf.ed.ac.uk

lsz@cs.washington.edu

steedman@inf.ed.ac.uk

文法獲得との関連（余談）

⇤ ILCC, School of Informatics

†Computer Science & Engineering

University of Edinburgh

University of Washington

‣ 今回の問題設定

Edinburgh, EH8 9AB, UK

Seattle, WA, 98195, USA

- ‐ （文、論理式）のペアから文の構造 (隠れ変数) を推定する

- ‐ 文法獲得の観点からは、生の文だけで学習するよりも現実的といえる？

Abstract

of propositional uncertainty1, from a set of con-

textually afforded meaning candidates, as here:どれが正解か

This paper presents an incremental prob- ‣ より現実的なタスク： Kwiatkowski et al.’12

abilistic learner that models the acquis-

分からない

tion of syntax and semantics from a cor-

Utterance : you have another cookie

8

pus of child-directed utterances paired with

<have(you, another(x, cookie(x)))

possible representations of their meanings.

Candidate

eat(you, your(x, cake(x)))

These meaning representations approxi-

Meanings : want(i, another(x, cookie(x)))

mate the contextual input available to the

child; they do not specify the meanings of

一文に対し、複数の候補が与えられたもとでの学習

The task is then to learn, from a sequence of such

individual words or syntactic derivations.

(utterance, meaning-candidates) pairs, the correct

The learner then has to infer the meanings

15 /37

and syntactic properties of the words in the

lexicon and parsing model. Here we present a

input along with a parsing model. We use

probabilistic account of this task with an empha-

the CCG grammatical framework and train

sis on cognitive plausibility.

a non-parametric Bayesian model of parse

Our criteria for plausibility are that the learner

structure with online variational Bayesian

must not require any language-specific informa-

expectation maximization. When tested on

tion prior to learning and that the learning algo-

utterances from the CHILDES corpus, our

rithm must be strictly incremental: it sees each

learner outperforms a state-of-the-art se-

mantic parser. In addition, it models such

training instance sequentially and exactly once.

aspects of child acquisition as “fast map-

We define a Bayesian model of parse structure

ping,” while also countering previous crit-

with Dirichlet process priors and train this on a

icisms of statistical syntactic learners.

set of (utterance, meaning-candidates) pairs de-

rived from the CHILDES corpus (MacWhinney,

1 Introduction

2000) using online variational Bayesian EM.

We evaluate the learnt grammar in three ways.

Children learn language by mapping the utter-

First, we test the accuracy of the trained model

ances they hear onto what they believe those ut-

in parsing unseen utterances onto gold standard

terances mean. The precise nature of the child’s

annotations of their meaning. We show that

prelinguistic representation of meaning is not

it outperforms a state-of-the-art semantic parser

known. We assume for present purposes that

(Kwiatkowski et al., 2010) when run with similar

it can be approximated by compositional logical

training conditions (i.e., neither system is given

representations such as (1), where the meaning is

the corpus based initialization originally used by

a logical expression that describes a relationship

Kwiatkowski et al.). We then examine the learn-

have between the person you refers to and the

ing curves of some individual words, showing that

object another(x, cookie(x)):

the model can learn word meanings on the ba-

sis of a single exposure, similar to the fast map-

Utterance : you have another cookie

(1)

ping phenomenon observed in children (Carey

Meaning : have(you, another(x, cookie(x)))

and Bartlett, 1978). Finally, we show that our

Most situations will support a number of plausi-

1Similar to referential uncertainty but relating to propo-

ble meanings, so the child has to learn in the face

sitions rather than referents. - a)

Utah

borders

Idaho

b)

What

states

border

Texas

N P

(S\NP )/NP

N P

(S/(S\NP ))/N

N

(S\NP )/NP

N P

utah

x. y.borders(y, x)

idaho

f. g. x.f (x) ^ g(x)

x.state(x)

x. y.borders(y, x)

texas

>

>

>

(S\NP )

S/(S\NP )

(S\NP )

y.borders(y, idaho)

g. x.state(x) ^ g(x)

y.borders(y, texas)

<

どのように学習するか？

>

S

S

borders(utah, idaho)

x.state(x) ^ borders(x, texas)

Figure 2: Two examples

S/S: of

λx.x CCG parses. S/S: λx.x

S\NP: λx.state(x)

S\NP: λx.state(x)

S/(S\NP)/(S\NP): λg.λf.λx.g(x) ∧ f(x) S/(S\NP)/(S\NP): λg.λf.λx.g(x) ∧ f(x)

a)

Utah

borders

Idaho

b)

What

states

border

Texas

of

N

noun-phrase P

and

(S\N

sentence Pr )/NP

espectively N

. P

Note that a sin-(S/(S\NP ))/N

application rules N

are then ext (S\N

ended P )/N

as

P

follows: NP

utah

x. y.borders(y, x)

idaho

f. g. x.f (x) ^ g(x)

x.state(x)

x.

論理式をもとに、単語レベルでありえ

y.borders(y, x)

texas

そうなカテゴリを抽出する

gle word can have more than one syntactic type, and hence

>

>

>

(S

(2) The functional application rules (with semantics):

more than one entry in the lexi \NP )

S/(S

con.

S/(S\NP)/(S\NP): \NP )

(S

λg.λf.λx.g(x) ∧ f(x)

S/(S\NP): λ \NP )

f.λx.state(x) ∧ f(x)

y.borders(y, idaho)

g. x.state(x) ^ g

a. (x)

A/B : f

B : y

g .border

) s(y,

A te:xas

f )

(g)

<

>

In addition to the lexicon, a S

CCG has a set of combinatory

S\NP: λx.state(x)

b. B : g S A\B : f

S/S: λx.x ) A : f(g)

borders(utah, idaho)

x.state(x) ^ borders(x, texas)

rules which describe how adjacent syntactic categories in

S\NP: λxRule

.bord 2(a)

ers(x, no

tex w

as) specifies how the semantics

・・・ of the category

a string can be recursively combined. The simplest such

Figure 2:

A is compositionally built out of the semantics for A/B

rules are rules of functional application, defined as ‣

Two examples of CCG parses.

モデ

follo

ルは木構造の上での対数線形モデ

ws:

ル

and B. Our derivations are then extended

What - ‐ to

S/S:include

λx.x

a com-

42

(1) The functional application rules:

- ‐ 主に、各単語がどのようなカテゴリと結びつく

positional semantics. See Figure 2(a)

Wfor

hat - ‐ an

S\NePxample

: λx.state(xparse.

) - ‐30

of noun-phrase and sentence respectively. Note that a sin-

application rules are then extended as follows:

a. A/B

B

)

A

べきか？を学習する

This parse shows that Utah borders Idaho

states - ‐ Shas

\NP: the

λx.s syn

tate(x tactic

)

gle word can have more than one syntactic type, and hence

63

(2)

type The

S

functional

and the

application

semantics bor rul

der es (with

s(utah, semantics):

more than

b. Bone entry

A\B in the

) lexi

Acon.

idaho).

a. A/B : f

B : g

)

A : f (g)

16 /37

Intuitiv In

ely,addition

a cate to the

gory of lexicon,

the

a

form CCG

A/B has a set

denotes of

a combinatory

string that

In spite of

b.

their B : g

relativ A

e \B : f )

simplicity,

A : f

CCGs (g)

can capture a

is of

rules

type A which

but is describe

missing ho

a w adjacent

string of t syntactic

ype B to cate

its gories

right; in wide range of syntactic and semantic phenomena. As one

Rule 2(a) now specifies how the semantics of the category

a

similarly, string

A\B can be recursi

denotes a

vely

string combined.

of type A

The

that is simplest

missing such

a

example, see Figure 2(b) for a more complex parse. Note

A is compositionally built out of the semantics for A/B

string ofrules

typeare

B rules

to

of

its functional

left.

application, defined as follows:

and

that inB. Our

this

deri

case vations

we haveare

an then extended

additional

to include

primitive

a com-

category, N

(1) The functional application rules:

positional

(for nouns),semantics.

and the See

final Figure 2(a)

semantics for

is aan example

lambda e parse.

xpression

The first rule says that a string with type A/B can be com-

a. A/B

B

)

A

This parse

denoting

sho

the

ws

set othat

f

Utah

entitiesbor

t der

hat s Idaho

are sta has

tes the

and syn

thatactic

t border

bined with a right-adjacent string of type B to form a new

b. B

A\B

)

A

Tetype

xas. S and

In

the

this semantics

case, the bor

le der

xicals(utah,

item idaho)

what .

has a relatively

string of type A. As one example, in our lexicon, borders,

complex category, which leads to the correct analysis of

(which Intuiti

has

v

the ely, a

type cate

(S gory of the form A/B denotes a string that

In spite of their relative simplicity, CCGs can capture a

\NP )/NP ) can be combined with

the underlying string.

is of type A but is missing a string of type B to its right;

wide range of syntactic and semantic phenomena. As one

Idaho (which has the type N P ) to form the string borders

similarly, A\B denotes a string of type A that is missing a

example, see Figure 2(b) for a more complex parse. Note

Idaho with type S\NP . The second rule is a symmetric

A full description of CCG goes beyond the scope of this

string of type B to its left.

that in this case we have an additional primitive category, N

rule applying to categories of the form A\B. We can use

paper. There are several extensions to the formalism: see

(for nouns), and the final semantics is a lambda expression

this to The

com first

bine rule says

Utah

that

(type a

N string

P )

with

with

type

bor

A/B

ders

can

Idaho be com-

(type

(Steedman, 1996, 2000) for more details. In particular,

denoting the set of entities that are states and that border

S

bined with a right-adjacent string of type B to form a new

\NP ) to form the string Utah borders Idaho with the type

CCG includes rules of combination that go beyond the sim-

Texas. In this case, the lexical item what has a relatively

string of type A. As one example, in our lexicon, borders,

S. We can draw a parse tree (or derivation) of Utah borders

comple

ple

x cate

function

gory, which

application

leads

rules in to the

1(a) correct

and 1(b analysis

).1

of

Additional

(which has the type (S\NP )/NP ) can be combined with

Idaho as follows:

the underlying

combinatory ru string.

les allow CCGs to give an elegant treatment

Idaho (which has the type N P ) to form the string borders of linguistic phenomena such as coordination and relative

Idaho with type S

Utah

\NP . The second rule is a symmetric

A full description of CCG goes beyond the scope of this

borders

Idaho

clauses. In our work we make use of standard rules of

rule applying to categories of the form A\B. We can use

paper. There are several extensions to the formalism: see

application, forward and backward composition, and type-

this to com

N P

(

bine Utah

S\N (type

P )

N

/N P

P ) with

N bor

P

ders Idaho (type

(Steedman, 1996, 2000) for more details. In particular,

>

S\NP ) to form the string

(

raising. In addition, we allow lexical entries consisting of

S Utah borders Idaho with the type

CCG includes rules of combination that go beyond the sim-

\NP )

S. We can draw a parse tree (or derivation)

<

of Utah borders

ple function

strings of

application

length greater rules

than in 1(a)

one, and

for e 1(b).1 Additional

xample

S

Idaho as follows:

combinatory rules allow CCGs to give an elegant treatment

the Mississippi

:= NP : mississippi river

of linguistic phenomena such as coordination and relative

Note that we use the notation > and < to denote appli-

Utah

borders

Idaho

clauses.

This leads In

to our

a

work

relativ we

ely mak

mi e

nor use of standard

change to the rules of

cation of rules 1(a) and 1(b) respectively.

formalism,

N P

(S\NP )/NP

N P

application, forward and backward composition, and type-

which in practice can be very useful. For example, it is eas-

>

CCGs typically include a

(

semantic type, as well as a syn-

raising. In addition, we allow lexical entries consisting of

S\NP )

ier to directly represent the fact that the Mississippi refers

tactic type, for each lexical entry. For example, <

our lexicon

strings of length greater than one, for example

S

would be extended as follows:

1the Mississippi< - 手法の進化

‣ Zeblemore & Collins’05

- ‐ 文と論理式のペアから初めて CCG を学習

- ‐ いくつかの機能語のカテゴリは固定する

e.g., every ⊢ (S/(S|NP))/N: λf.λg.∀x.f(x) → g(x)

Parameter Initialization

‣ Kwiatkovski et al.’10

Compute co-occurrence (IBM Model 1)

- ‐ 全ての語のカテゴリを学習する（英語以外でも学習可能に）

between words and logical constants

- ‐ 良い初期値を得るために

I want a flight to Boston ` S : x.flight(x) ^ to(x, BOS)

IBM モデル1 を最初に使う

I want a flight to Boston ` S : x.flight(x) ^ to(x, BOS)

‣ Kwiatkovski et al.’11

Artzi et al.’13

Initial score for new lexical entries: average

- ‐ カテゴリのパラメータを分解してスパ

ov ースネスを抑え

er pairwise weightsる

17 /37 - 大きく二つの意味表現

CCG + 論理式系

DCS系

Ze#lemore & Collins’05

文と論理式の

Ze#lemore & Collins’07

Kwiatkovski et al.’10

ペアから学習

Kwiatkovski et al.’11

・・・

Kwiatkovski et al.’13

文と答えの

Liang et al.’11

QA以外

Artzi & Ze#lemore’11

Berant et al.’13

ペアから学習

Artzi & Ze#lemore’13

Berant and Liang’14

Matsuzek et al.’12

他に Tree Grammar 系もあるが省略

18 /37 - 文と答えのペアから学習

Graphical Model

capital of

x California?

Semantic Parsing: p(z | x, ✓)

(probabilistic)

⇤⇤

parameters

1

2

✓

z

capital

1

1

CA

Interpretation: p(y | z, w)

(deterministic)

database

w

y Sacramento

‣ これまでは、CCGの導出を隠れ変数としてモデル化した

11

‣ DCS では、論理表現を隠れ変数として扱う

19 /37 - DCS

Dependency- ‐based ComposiXonal SemanXcs

Basic DCS Trees

例: city in California

DCS tree

Constraints

Database

city

c 2 city

city

1

San Francisco

c

Chicago

1 = `1

Boston

1

· · ·

部分木は

loc

` 2 loc

2

loc

集合を表す

`2 = s1

Mount Shasta California

1

San Francisco California

CA

s 2 CA

Boston

Massachusetts

· · ·

· · ·

loc の2列目が California

であるような loc の要素

CA

California

20 /37

A DCS tree encodes a constraint satisfaction problem (CSP)

Computation: dynamic programming ) time = O(# nodes)

6 - DCS

Dependency- ‐based ComposiXonal SemanXcs

Basic DCS Trees

例: city in California

DCS tree

Constraints

Database

Liang et al.

Learning

city

Dependency-Based

c 2 city

Compositional Semantics

city

1

San Francisco

c

Chicago

1 = `1

Example: major city in California

Boston

1

· · ·

loc

` 2 loc

z = hcity; 11:hmajori ; 11:hloc; 21:hCAiii

2

loc

city

`2 = s1

Mount Shasta California

1 1

1

San Francisco California

1

1

CA

s

major

loc

2 CA

Boston

Massachusetts

2

c 9m 9` 9s .

· · ·

· · ·

1

city(c) ^ major(m) ^ loc(`) ^ CA(s)^

CA

c1 = m1 ^ c1 = `1 ^ `2 = s1

CA

California

(a) DCS tree

(b) Lambda calculus formula

20 /37

(c) Denotation: Jz

=

A

Kw DC{SF,LA,...

S tree en

} codes a constraint satisfaction problem (CSP)

Figure 3

Computation: dynamic programming ) time = O(# nodes)

(a) An example of a DCS tree (written in both the mathematical and graphical notation). Each

6

node is labeled with a predicate, and each edge is labeled with a relation. (b) A DCS tree z with

only join relations encodes a constraint satisfaction problem, represented here as a lambda

calculus formula. For example, the root node label city corresponds to a unary predicate

city(c), the right child node label loc corresponds to a binary predicate loc(`) (where ` is a

pair), and the edge between them denotes the constraint c1 = `1, where the indices corresponds

to the two labels on the edge. (c) The denotation of z is the set of feasible values for the root

node.

w:

city

loc

>

San Francisco

Mount Shasta California

7

3

Chicago

San Francisco California

5

0

Boston

Boston

Massachusetts

18 2

· · ·

· · ·

· · ·

· · · · · ·

state

population

count

Alabama

Los Angeles

3.8 million

{}

0

Alaska

San Francisco 805,000

{1,4} 2

Arizona

Boston

617,000

{2,5,6} 3

· · ·

· · ·

· · ·

· · ·

· · ·

Figure 4

We use the domain of US geography as a running example. The figure presents an example of a

world w (database) in this domain. A world maps each predicate to a set of tuples. For example,

the depicted world w maps the predicate loc to the set of pairs of places and their containers.

Note that functions (e.g., population) are also represented as predicates for uniformity. Some

predicates (e.g., count) map to an infinite number of tuples and would be represented

implicitly.

2.3 Worlds

In the context of question answering, the DCS tree is a formal specification of the

question. To obtain an answer, we still need to evaluate the DCS tree with respect to

a database of facts (see Figure 4 for an example). We will use the term world to refer

7 - DCS の特徴

Challenges

‣ 論理式は、自然言語と意味表現の間に大きなギャップがある

Computational: how to efficiently search exponential space?

What is the most populous city in California?

New: Dependency-Based C Ne

om w

p :

os D

itiepe

on n

al de

S n

e c

mya-B

nt a

i s

c e

s d

( C

D om

CS) positional Semantics (DCS)

argmax( x.city(x) ^ loc(x, CA), x.population(x))

most populous city in California

most populous city in California

‣ DCS は文の係り受け構造にかなり似ている

Los Angeles

city

city

1

1

1

1

populous

in

population

loc

2

c

1

most

California

argmax

CA

21 /37

3

4

4 - つまり…

‣ 論理式を（文、答え）のペアから導出するのはかなり厳しい

- ‐ 自然言語との乖離があるため、意味のある候補を探すことができない

‣ DCS は、（文、答え）のペアからでも学習できるほどにシンプル

で、かつ十分な表現力を持つ意味表現

- ‐ 文の木構造を反映した意味表現

- ‐ 従って表現できる意味はラムダ計算のサブセット

- ‐ しかし、自然に出てくる文の意味を表現するのには十分？

- ‐ 文の木構造と意味表現に透過性を持たせるための工夫： Mark- ‐Execute

22 /37 - Mar

Solut k

i - ‐

onE

: xe

Ma c

rk u

-Ete

xecute

most populous city in California

Superlatives

⇤⇤

Execute at semantic scope

x1

Divergence between Syntactic and Sem city

antic Scope

1

1

1

1

most populous city in California

population

loc

Mark at syntactic scope

2

Syntax

Sem c

antics

1

argmax

CA

city

populous

in

argmax( x.city(x) ^ loc(x, CA), x.population(x))

9

23 /37

most

California

Problem: syntactic scope is lower than semantic scope

If DCS trees look like syntax, how do we get correct semantics?

8 - Solution: Mark-Execute

Solution: Mark-Execute

全量子化、

Soluti S

on c:op

Marek- am

Exec b

ut ieguity

Some river traverses every city.

Some river traverses every city.

Some river traverses every city.

Quantification (narrow)

Quantification (wide)

Quantification (narrow)

⇤⇤

⇤⇤

Execute at semantic scope

Execute at semantic scope ⇤⇤

x12

x21

Execute at semantic scope

x12

traverse

traverse

1 2

traverse

1 2

1

1

1 2

1

1

river

city

1

1 river

city

Mark at syntactic scope

Mark at syntactic scope

q

q

river

cityq

q

Mark at syntactic scope

some

every

q

qsome

every

some

every

surface scope

inverse scope

∃x.(river(x) ∧ ∀y.(city(y) → traverse(x, y)))

∀y.(city(y) → ∃x.(river(x) ∧ traverse(x, y)))

9

9

継続の shif/reset 操作と似ているらしい

24 /37 9 - どのように学習するか？

city in CA

Basic DCS Trees

‣ CCG の場合と基本的に同じ

CaliforDn

C i

S a

tr ci

ee 5es

Constraints

Database

- ‐ DCS の構造については何も仮定しない

city

c 2 city

city

1

San Francisco

Chicago

Words to Predicates (Lexical Semantics)

c1 = `1

Boston

- ‐

1

文の係り受け構造は使わない

· · ·

loc

` 2 loc

2

loc

`

city

city

2 = s1

Mount Shasta California

1

San Francisco California

state

state

CA

s 2 CA

Boston

Massachusetts

· · ·

· · ·

river

river

CA

argmax

population

population

CA

California

What is the

most

populous

city

in CA ?

A DCS tree encodes a constraint satisfaction problem (CSP)

Computation: dynamic programming ) time = O(# nodes)

6

Lexical Triggers:

機能語や一部の語は

1. String match

CA

) CA

人手で正解を与える

2. Function words (20 words) most ) argmax

3. Nouns/adjectives

city ) city state river population

25 /37

13 - どのように学習するか？

‣ CCG の場合と基本的に同じ

- ‐ DP で全探索することができない

Predicates to DCS Trees (Compositional Semantics)

- ‐ Beam- ‐

C se

i,j ar

= c

s h

et で

of k

D - ‐

Cb

S ets

r t

eeの木を抽出し、

s for span [i, j]

SGD で更新

city

1

1

1

1

population

loc

population

2

city

c

1

1

c

argmax

CA

1

argmax

loc

2

1

CA

Ci,k

Ck,j

i

k

j

most populous

city in California

26 /37

14 - 実験：GEO data

訓練データ： 論理式 or 答えとペアの文の集合 (600文)

what states does the ohio river run through

(lambda $0 e (and (state:t $0) (loc:t ohio_river:r $0)))

what states surround kentucky

(lambda $0 e (and (state:t $0) (next_to:t $0 kentucky:s)))

what is the capital of states that have ci6es named durham

(lambda $0 e (and (capital:t $0) (exists $1 (and (state:t $1) (exists $2 (and (city:t $2) (named:t $2

durham:n) (loc:t $2 $1))) (loc:t $0 $1)))))

which is the highest peak not in alaska

(argmax $0 (and (mountain:t $0) (not (loc:t $0 alaska:s))) (elevaRon:i $0))

‣ 少し複雑な表現（接続詞、最上級、否定など）を含む

‣ 語彙は少ない

- ‐ 単語のタイプ数：280

- ‐ 述語の数：48

27 /37 - 比較

Experiment 2

On Geo, 600 training examples, 280 test examples

System Description

Lexicon Logical forms

zc05

CCG [Zettlemoyer & Collins, 2005]

zc07

relaxed CCG [Zettlemoyer & Collins, 2007]

kzgs10 CCG w/unification [Kwiatkowski et al., 2010]

dcs

our system

dcs+

our system

100

95

91.1%

y

88.9%

90

88.6%

rac

86.1%

cu 85

ac

79.3%

80

test

75

zc05

zc07

kzgs10

dcs

dcs+

28 /3723 - これまでのまとめ

‣ 教師あり QA に対する二つのアプローチ

- ‐ CCG 系：(文、論理式) のペアから、CCG のモデルを学習する

- ‐ DCS 系：(文、答え) のペアから、DCS のモデルを学習する

‣ DCS のほうが性能が高いが、語彙レベルで手がかりを与えないと

いけない

- ‐ CCG 系は、与えられた論理式から IBM モデルなどでチューニングできる

‣ 今後の展開

- ‐ web スケールの QA システムへの拡張 (Freebase)

- ‐ CCG 系でも、論理式を直接与えずに学習ができるようになってきた

- ‐ QA 以外での DCS の活用

29 /37 - Dataset comparison

Free917 [Cai & Yates, 2013]: 917 examples, 2,036 word types

What is the engine in a 2010 Ferrari California?

What was the cover price of the X-men Issue 1?

web- ‐scale の QA を行いたい

• Generate questions based on Freebase facts

Berant et al.’13

Kwiatkovski’13

Berant and Liang’14

WebQuestions [our work]: 5,810 examples, 4,525 word types

What character did Natalie Portman play in Star Wars?

What kind of money to take to Bahamas?

What did Edward Jenner do for a living?

• Generate questions from Google ) less formulaic

‣ これまでは比較的綺麗なデータを扱っていた（語彙も少ない）

‣ web のデータベースをもとに、システムをスケールさせることは

できるか？

26

30 /37 - Freebase knowledge graph

MichelleObama

Gender

Female

USState

PlacesLived

Spouse

1992.10.03

Type

StartDate

Event21

Event8

Hawaii

ContainedBy

Location

Type

Marriage

UnitedStates

ContainedBy

ContainedBy

FreChice

Fr

agob

eLocas

eatib

on

e

a

s k

e n

k

Places o

nLived w

oBar l

wac e

kObd

am g

a

e

ge Placeg

grOra

fBirth ap

phH h

onolulu

Event3

Type

DateOfBirth

Profession

Typ B

e erant et al.’13

MichelleObama Person

1961.08.04

Politician

City

Gender

Female

USState

PlacesLived

Spouse

1992.10.03

Type

41M entities (nodes)

StartDate

Event21

Event8

Hawaii

19K properties (edge labels)

ContainedBy

Location

Type

Marriage

UnitedStates

ContainedBy

596M assertions (edges)

ContainedBy

Chicago

BarackObama

PlaceOfBirth

Honolulu

Location

PlacesLived

Event3

Type

DateOfBirth

Profession

Type

10

Person

1961.08.04

Politician

City

SPARQL によってクエリを投げることができる

9

31 /37 - Scaling Semantic Parsers with On-the-fly Ontology Matching

Tom Kwiatkowski

Eunsol Choi

Yoav Artzi

Luke

Align

Zettlemoyer ment

Computer Science & Engineering

BarackObama

University of Washington

TopGun

Seattle, WA 98195

Type.Country

Profession.Lawyer

{tomk,eunsol,yoav,lsz}@cs.washington.edu

Peop 何が難しいか

leBornHere

InventorOf

..

？

....

‣ 述語が多く、自然文との間にミスマッチが発生

Berant et al.’13

Abstract

tions (Chen and Type.Huma

Moone nL

y,anguage

2011; Artzi and Zettle-

Brazil

Type.ProgrammingLanguage

BrazilFootballTeam

moyer, 2013b), and generating programs (Kushman

We consider the challenge of learning seman-

and Barzilay,

al

2013). ignment

alignment

tic parsers that scale to large, open-domain

In

Wh

each atcase,

lan

the guage

parser suses ado people

predefined in

set

Brazil

use

problems, such as question answering with

Freebase. In such settings, the sentences cover

of logical constants, or an ontology, to construct

a wide variety of topics and include many

meaning representations. In practice, the choice

- ‐ GEO のように全ての述語を enumerate して学習することができない

phrases whose meaning is difficult to rep-

of ontology significantly impacts learning.

For

resent in a fixed target ontology. For ex-

example, consider the following questions (Q) and

ample, even simple phrases such as ‘daugh-

‣ 使用すべき述語がドメイン依存

candidate meaning representations (MR):

ter’ and ‘number of people living in’ can-

13

not be directly represented in Freebase, whose

Q1: What is the population of Seattle?

ontology instead encodes facts about gen-

Q2: How many people live in Seattle?

Freebase ではこちらしか

der, parenthood, and population. In this pa-

MR1:

x.population(Seattle, x)

受け付けない

per, we introduce a new semantic parsing ap-

MR2: count( x.person(x) ^ live(x, Seattle))

proach that learns to resolve such ontologi-

cal mismatches. The parser is learned from

A semantic parser might aim to construct MR1 for

32 /37

question-answer pairs, uses a probabilistic

Q1 and MR2 for Q2; these pairings align constants

CCG to build linguistically motivated logical-

(count, person, etc.) directly to phrases (‘How

form meaning representations, and includes

many,’ ‘people,’ etc.). Unfortunately, few ontologies

an ontology matching model that adapts the

have sufficient coverage to support both meaning

output logical forms for each target ontology.

Experiments demonstrate state-of-the-art per-

representations, for example many QA databases

formance on two benchmark semantic parsing

would only include the population relation required

datasets, including a nine point accuracy im-

for MR1. Most existing approaches would, given

provement on a recent Freebase QA corpus.

this deficiency, simply aim to produce MR1 for Q2,

thereby introducing significant lexical ambiguity

that complicates learning. Such ontological mis-

1 Introduction

matches become increasingly common as domain

Semantic parsers map sentences to formal represen-

and language complexity increases.

tations of their underlying meaning. Recently, al-

In this paper, we introduce a semantic parsing ap-

gorithms have been developed to learn such parsers

proach that supports scalable, open-domain ontolog-

for many applications, including question answering

ical reasoning. The parser first constructs a linguis-

(QA) (Kwiatkowski et al., 2011; Liang et al., 2011),

tically motivated domain-independent meaning rep-

relation extraction (Krishnamurthy and Mitchell,

resentation. For example, possibly producing MR1

2012), robot control (Matuszek et al., 2012; Kr-

for Q1 and MR2 for Q2 above. It then uses a learned

ishnamurthy and Kollar, 2013), interpreting instruc-

ontology matching model to transform this represen- - DCS 系のアプローチ Berant et al.’13

Berant and Liang’14

• Intersection: If u1 and u2 are both unaries,

Type.Location u PeopleBornHere.BarackObama

then u

intersection

1 u u2 (e.g., Profession.Scientist u

PlaceOfBirth.Seattle) denotes set intersec-

Type.Location

was

PeopleBornHere.BarackObama

?

tion: Ju

join

lexicon

1 u u2K

.

K = Ju1KK \ Ju2KK

• Aggregation: If u is a unary, then count(u)

where

BarackObama

PeopleBornHere

denotes the cardinality:

Jcount(u)K

lexicon

lexicon

K

=

{|JuK

Obama

born

K|}.

As a final example, “number of dramas star-

‣ 機能が制限された DCS (basic λ- ‐DCS) を用いている

ring Tom Cruise” in lambda calculus would

Figure 2: An example of a derivation d of the utterance

be represented as count( x.Genre(x, Drama) - ‐

^ Mark- ‐Exec

“Wher u

e te などはいつの間にか消え

was Obama born?” and its ている

sub-derivations, each

9y.Performance(x, y) ^ Actor(y, TomCruise));

labeled with composition rule (in blue) and logical form

- ‐ Freebase のクエリは単に知識を問うことしかできず、量子化などを表現

in

-DCS, it is simply

(in red). The derivation

count(Genre.Drama u

d skips the words “was” and “?”.

する必要性がない（できない）から？

Performance.Actor.TomCruise).

It is useful to think of the knowledge base K - ‐

as 熟語を選ぶ難しさが増したが、構造の導出はより簡単に？

ily over-generates. We instead rely on features and

a directed graph in which entities are nodes and

learning to guide us away from the bad derivations.

33 /37

properties are labels on the edges. Then simple -

DCS unary logical forms are tree-like graph patterns

Modeling Following Zettlemoyer and Collins

which pick out a subset of the nodes.

(2005) and Liang et al. (2011), we define a

discriminative log-linear model over derivations

2.3 Framework

d 2 D(x) given utterances x: p✓(d | x) =

exp

Given an utterance

{ (x,d)>✓}

x, our semantic parser constructs

P

, where (x, d) is a feature

d02D(x) exp{ (x,d0)>✓}

a distribution over possible derivations D(x). Each

vector extracted from the utterance and the deriva-

derivation d 2 D(x) is a tree specifying the appli- tion, and ✓ 2 Rb is the vector of parameters to

cation of a set of combination rules that culminates

be learned. As our training data consists only of

in the logical form d.z at the root of the tree—see

question-answer pairs (xi, yi), we maximize the log-

Figure 2 for an example.

likelihood of the correct answer (Jd.zKK = yi), sum-

ming over the latent derivation d. Formally, our

Composition Derivations are constructed recur-

training objective is

sively based on (i) a lexicon mapping natural lan-

n

guage phrases to knowledge base predicates, and (ii)

X

X

O(✓) =

log

p✓(d | xi).

(1)

a small set of composition rules.

i=1

d2D(x):Jd.zK

More specifically, we build a set of derivations for

K=yi

each span of the utterance. We first use the lexicon to

Section 4 describes an approximation of this ob-

generate single-predicate derivations for any match-

jective that we maximize to choose parameters ✓.

ing span (e.g., “born” maps to PeopleBornHere).

Then, given any logical form

3 Approach

z1 that has been con-

structed over the span [i1 : j1] and z2 over a non-

Our knowledge base has more than 19,000 proper-

overlapping span [i2 : j2], we generate the following

ties, so a major challenge is generating a manage-

logical forms over the enclosing span [min(i1, i2) :

able set of predicates for an utterance. We propose

max(j1, j2)]: intersection z1 u z2, join z1.z2, ag-

two strategies for doing this. First (Section 3.1),

gregation z1(z2) (e.g., if z1 = count), or bridging

we construct a lexicon that maps natural language

z1 u p.z2 for any property p 2 P (explained more in

phrases to logical predicates by aligning a large text

Section 3.2).3

corpus to Freebase, reminiscent of Cai and Yates

Note that the construction of derivations D(x)

(2013). Second, we generate logical predicates com-

allows us to skip any words, and in general heav-

patible with neighboring predicates using the bridg-

3

ing operation (Section 3.2). Bridging is crucial when

We also discard logical forms are incompatible according

to the Freebase types (e.g., Profession.Politician u

aligning phrases is difficult or even impossible. The

Type.City would be rejected).

derivations produced by combining these predicates - 2 CCG でも答えから学習する

Step Semantic Parsing

Do

Domai ma

n Ind

in

ependent Ind

Parse ependent Parsing

Kwiatkovski’13

2 Step

How many

Sema

people

ntilivc

e

Parsining

ドメイン非依存の論理式を最初につくる

Seattle

CCG の語彙は学習しない（ある程度人手で与え

Domain

S/( Ind

S\Nepend

P )/N

ent Parse N

S\NP

る）

S\S/NP

N P

f g x.eq(x, count(

x.people(x)

x ev.live(x, ev)

x f 9ev.in(ev, x) seattle

How many

people

live

in

Seattle

How

y.m

g an

(y y

) ^ f(y)))

people

live

in

^ f(ev)

Seattle

S/(S\NP )/N

N

> S\NP

S\S/NP

N P

>

<B

f g x.

S/eq

(S(x,

\N count

P )/N (

x.P (x)

N

x ev.P (x, ev

S )\NP x f9ev.P(ev, x)

S ^

\ f

S/ (

Nev

P )

C

N P >

f g x.eq( y.

x, cg(

o y)

un ^

t( f(y)))

x.people(x)

x

S ev.live(x, ev)

x f 9ev.in(ev, x) seattle

y.g(y

x. )

e ^

q( f

x, (y

c )))

ount( y.9ev.people(y) ^ live(y, ev) ^ in(ev, seattle)))

>

^ f(ev)

>

S/(S\NP )

>

S\S

>

<B

g x.eq(x, count( y.g(y) ^ P (y)))

f 9ev.P (ev, C) ^ f(ev)

>

Ontology Match

S

<B

x.eq(x, count( y.9ev.people(y) ^ live(y, ev)S\

^ NP

in(ev, seattle)))

x9ev.P (x, ev) ^ P (ev, C)

>

Onto

x.lo

e g

q y

( M

x, actch

ount( y.9ev.people S

(y) ^ live(y, ev) ^ in(ev, seattle)))

x.eq(x, count( y.P (y) ^ 9ev.P (y, ev) ^ P (ev, C)))

Structure Match

x.eq(x,

x. count

how( y.

x.eq(x, count( y.

9ev.p

man eop

y le(y) ^

peopl li

e ve(

liy,

v ev

e ) ^

inin

( (ev, seat

seattl tle

e, )))

9ev.people(y) ^ live(y, ev) ^ in(ev

x) , seattle)))

Structure Match

Constant String

x.h

x.h oloa

wbel

w

s s

man

man ig

y

y nify s

peopl

peopl oe

e ur

li

li ce

ve

ve w

in

in o

(

( rds

sea , no

ttle,

seattle, t

x)

x)

Matches

semantic constants.

論理式まで含めて

Co

fo ns

r ta

nt

.

x.how many people live in(seattle, x)

x.population(seattle, x) 隠れ変数として学習

Matches

34 /37

for .

x.population(seattle, x) - 両者の差が小さくなっている？

‣ 従来の CCG 系

- ‐ 文と論理式のペアから学習する

- ‐ それ以外のチューニングは何もいらない（言語や論理体系にも非依存）

‣ Kwiatkowski et al.’13

- ‐ CCG の導出は、人手である程度手がかりを与える（DCS と類似）

- ‐ 導出した論理式を、Freebase の表現に合うように確率的に書き換える

- ‐ クエリを投げて答えと一致していれば、それまでの過程を正解とみなす

35 /37 - QA 以外での DCS と CCG

‣ CCG は広い範囲に使われだしている

Modeling Instructions

- ‐ 入力に対する論理式（プログラム）を学習するような問題

- ‐ 対話ログからの対話シ

1

2

3

4

ステ

5 ムの構築 (ArX and Zeblemore’11)

1

Modeling Instructions

Events can be modified

- ‐ ロボットの誘導 (Artzi and Zeblem

b ore’1

y adv 3)

erbials

2

1

2

3

4

5

1

Events can be modified

go to the chair

この対応関係を

3

by adverbials

得ることが目的

2

a.move(a)^

4

・ただし論理式は

go to the chair

to(a, ◆x.chair(x))

3

5

直接与えられない

a.move(a)^

{ }

4

・実行したらそれが

to(a, ◆x.chair(x))

5

{ }

正解がどうかが分かる

現在位置

Artzi et al.’13

36 /37 - QA 以外での DCS と CCG

‣ DCS の意味表示の実行は、データベースの存在に依存している

- ‐ データベース上での集合の直積によって意味が表現される

T:

love

H:

have

SUBJ

OBJ

SUBJ

OBJ

ARG

⊂

ARG

ARG

ARG

Mary

dog

Tom

animal

‣ Tian, Miyao and Matsuzaki’14 (ACL)

ARG

OBJ

Figure 1: The DCS tree of “students read books”

have

SUBJ

OBJ

love

- ‐ DCS の枠組みを、含意関係認識に適用

ARG

ARG

SUBJ

Tom

ARG

dog

Mary

student

book

read

ARG

- ‐ データベースがなくても

SUBJ

OBJ DCS を意味表示として

Mark

ARG

Mark

New York Times

Figure 2: DCS trees of “Mary loves every dog”

John

A Tale 用いることができる方法を示した

of Two Cities

Mary

A Tale of Two Cities

(abstract denotaXon)

(Left-Up), “Tom has a dog” (Left-Down), and

Emily

Ulysses

John

Ulysses

...

...

...

...

“Tom has an animal that Mary loves” (Right).

‣ CCG のほうが歴史が古い分、新しい問題にも適用しやすい？

Table 1: Databases of student, book, and read

- ‐ DCS はよりシンプルで文の構造と親和性が高い

responding words1. To formulate the database

querying process defined by a DCS tree, we pro-

2.1 DCS tr - ‐ ど

ees の表現がどの問題に対し、どれぐらい（なぜ）優れ

vide formal

ているのか

semantics to DCS trees by employing

37 /37

DCS trees has been proposed to represent natu-

relational algebra (Codd, 1970) for representing

ral language semantics with a structure similar to

the query. As described below, we represent mean-

dependency trees (Liang et al., 2011) (Figure 1).

ings of sentences with abstract denotations, and

For the sentence “students read books”, imagine

logical relations among sentences are computed

a database consists of three tables, namely, a set

as relations among their abstract denotations. In

of students, a set of books, and a set of “reading”

this way, we can perform inference over formulas

events (Table 1). The DCS tree in Figure 1 is in-

of relational algebra, without computing database

terpreted as a command for querying these tables,

entries explicitly.

obtaining “reading” entries whose “SUBJ” field

2.2 Abstract denotations

is student and whose “OBJ” field is book. The

result is a set {John reads Ulysses, . . .}, which is

Abstract denotations are formulas constructed

called a denotation.

from a minimal set of relational algebra (Codd,

DCS trees can be extended to represent linguis-

1970) operators, which is already able to formu-

tic phenomena such as quantification and coref-

late the database queries defined by DCS trees.

erence, with additional markers introducing addi-

For example, the semantics of “students read

tional operations on tables. Figure 2 shows an ex-

books” is given by the abstract denotation:

ample with a quantifier “every”, which is marked

as “

F

⇢” on the edge (love)OBJ-ARG(dog) and in-

1 = read \ (studentSUBJ ⇥ bookOBJ),

terpreted as a division operator qOBJ

⇢

(§2.2). Op-

where read, student and book denote sets repre-

timistically, we believe DCS can provide a frame-

sented by these words respectively, and w

work of semantic representation with sufficiently

r repre-

sents the set w considered as the domain of the

wide coverage for real-world texts.

semantic role r (e.g. bookOBJ is the set of books

The strict semantics of DCS trees brings us the

considered as objects). The operators \ and ⇥ rep-

idea of applying DCS to logical inference. This is

resent intersection and Cartesian product respec-

not trivial, however, because DCS works under the

tively, both borrowed from relational algebra. It

assumption that databases are explicitly available.

is not hard to see the abstract denotation denotes

Obviously this is unrealistic for logical inference

the intersection of the “reading” set (as illustrated

on unrestricted texts, because we cannot prepare

by the “read” table in Table 1) with the product of

a database for everything in the world. This fact

“student” set and “book” set, which results in the

fairly restricts the applicable tasks of DCS.

same denotation as computed by the DCS tree in

Our solution is to redefine DCS trees without

Figure 1, i.e. {John reads Ulysses, . . .}. However,

the aid of any databases, by considering each node

the point is that F1 itself is an algebraic formula

of a DCS tree as a content word in a sentence (but

that does not depend on any concrete databases.

may no longer be a table in a specific database),

Formally, we introduce the following constants:

while each edge represents semantic relations be-

tween two words. The labels on both ends of

• W : a universal set containing all entities.

an edge, such as SUBJ (subject) and OBJ (ob-

1The semantic role ARG is specifically defined for denot-

ject), are considered as semantic roles of the cor-

ing nominal predicate. - Reference (1)

‣ Yoav Artzi and Luke S ZeHlemoyer (2011). Bootstrapping seman_c parsers from

conversa_ons. In EMNLP.

‣ Yoav Artzi and Luke S ZeHlemoyer (2013). Weakly Supervised Learning of Seman_c Parsers

for Mapping Instruc_ons to Ac_ons. In TACL.

‣ Yoav Artzi, Nicholas FitzGerald, and Luke ZeHlemoyer (2013). Seman_c Parsing with

Combinatory Categorical Grammars. In ACL tutorial.

‣ Jonathan Berant, Andrew Chou, Roy Fros_g, and Percy Liang (2013). Seman_c Parsing on

Freebase from Ques_on- ‐Answer Pairs. In EMNLP.

‣ Jonathan Berant and Percy Liang (2014). Seman_c parsing via paraphrasing. In ACL.

‣ Tom Kwiatkowski, Luke S ZeHlemoyer, Sharon Goldwater, and Mark Steedman (2010).

Inducing probabilis_c CCG grammars from logical form with higher- ‐order uniﬁca_on. In

EMNLP.

‣ Tom Kwiatkowski, Luke S ZeHlemoyer, Sharon Goldwater, and Mark Steedman (2011).

Lexical generaliza_on in CCG grammar induc_on for seman_c parsing. In EMNLP. - Reference (2)

‣ Tom Kwiatkowski, Sharon Goldwater, Luke S ZeHlemoyer, and Mark Steedman (2012). A

probabilis_c model of syntac_c and seman_c acquisi_on from child- ‐directed uHerances

and their meanings. In EACL.

‣ Tom Kwiatkowski, E Choi, Y Artzi, and Luke S ZeHlemoyer (2013). Scaling seman_c parsers

with on- ‐the- ‐ﬂy ontology matching. In EMNLP.

‣ Percy Liang, Michael I Jordan, and Dan Klein (2011). Learning dependency- ‐based

composi_onal seman_cs. In ACL.

‣ Percy Liang, Michael I Jordan, and Dan Klein (2013). Learning dependency- ‐based

composi_onal seman_cs. In ComputaBonal LinguisBcs.

‣ Cynthia Matuszek, Nicholas FitzGerald, Luke S ZeHlemoyer, Liefeng Bo, and Dieter Fox

(2012). A Joint Model of Language and Percep_on for Grounded AHribute Learning. In

ICML.

‣ Ran Tian, Yusuke Miyao, and Takuya Matsuzaki (2014). Logical Inference on Dependency- ‐

based Composi_onal Seman_cs. In ACL. - Reference (3)

‣ Luke S ZeHlemoyer and Michael Col ins (2005). Learning to Map Sentences to Logical Form:

Structured Classiﬁca_on with Probabilis_c Categorial Grammars. In UAI.

‣ Luke S ZeHlemoyer and Michael Col ins (2007). Online learning of relaxed CCG grammars

for parsing to logical form. In EMNLP.