このページは http://www.slideshare.net/jchoi7s/dependency-parsing-1968241 の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

約7年前 (2009/09/08)にアップロードinテクノロジー

This report shows what a dependency structure is, why a dependency structure is useful, and how t...

This report shows what a dependency structure is, why a dependency structure is useful, and how to parse natural sentences to dependency structures. The report describes two stat-of-art dependency parsers, MaltParser and MSTParser, and shows comparisons between the parsers and ways to integrate them. Finally, it suggests a new parsing algorithm and possible applications using dependency structures.

- Dependency Parsing

Jinho D. Choi

University of Colorado

Preliminary Exam

March 4, 2009 - Contents

• Dependency Structure

- What is dependency structure?

- Phrase structure vs. Dependency structure

- Dependency Graph

• Dependency Parsers

- MaltParser: Nivre’s algorithm

- MSTParser: Edmonds’s algorithm

- MaltParser vs. MSTParser

- Choi’s algorithm

• Applications - Dependency Structure

• What is dependency?

- Syntactic or semantic relation between lexicons

- Syntactic: NMOD, AMOD, Semantic: LOC, MNR

• Phrase Structure(PS) vs. Dependency Structure(DS)

- Constituents vs. Dependencies

- There are no phrasal nodes in DS.

!Each node in DS represents a word-token.

- In DS, every node except the root is dependent in exactly one

other node. - Phrase vs. Dependency

She bought a car

Phrase Structure

Dependency Structure

S

bought

NP

VP

SBJ

OBJ

Pro

V

NP

she

car

DET

she

bought

Det

N

a

a

car

• Not flexible with word-orders

• Language dependent

• No semantic information - Dependency Graph

• For a sentence x = w1..wn, a dependency graph Gx = (Vx, Ex)

- Vx = {w0 = root, w1, ... , wn},

- Ex = {(wi, r, wj) : wi " wj, wi ! Vx, wj ! Vx - w0, r ! Rx}

!Rx = a set of all possible dependency relations in x

• Well-formed Dependency Graph

Root

- Unique root

bought

SBJ

OBJ

- Single head

She

car

- Connected

NMOD

- Acyclic

Jinho

a - Projectivity vs Non-projectivity

• Projectivity means no cross-edges.

root

She

bought

a

car

root

She

bought

a

car

yesterday

that

was

blue

• Why projectivity?

- Regenerate the original sentence with the same word-orders

- Parsing is less expressive (O(n) vs. O(n2))

- There are not many non-projective relations - Dependency Parsers

• Two state-of-art dependency parsers

- MaltParser: performed the best in CoNLL 2007 shared task

- MSTParser: performed the best in CoNLL 2006 shared task

• MaltParser

- Developed by Johan Hall, Jens Nilsson, and Joakim Nivre

- Nivre’s algorithm(p, O(n)), Covington’s algorithm(n, O(n2))

• MSTParser

- Developed by Ryan McDonald

- Eisner’s algorithm(p,O(k log k)), Edmonds’s algorithm(n, O(kn2) - Nivre’s Algorithm

• Based on Shift-Reduce algorithm

• S = a stack

• I = a list of remaining input tokens

she

bought

a

car - Nivre’s Algorithm

she

bought

a

car

S

I

A - Nivre’s Algorithm

she

bought

a

car

S

I

A

• Initialize - Nivre’s Algorithm

she

bought

a

car

she

bought

a

car

S

I

A

• Initialize - Nivre’s Algorithm

she

bought

a

car

she

bought

a

car

S

I

A

• Initialize

• Shift : ‘she’ - Nivre’s Algorithm

she

bought

a

car

bought

a

she

car

S

I

A

• Initialize

• Shift : ‘she’ - Nivre’s Algorithm

she

bought

a

car

bought

a

she

car

S

I

A

• Initialize

• Shift : ‘she’

• Left-Arc : ‘she ! bought’ - Nivre’s Algorithm

she

bought

a

car

bought

a

she

car

she ! bought

S

I

A

• Initialize

• Shift : ‘she’

• Left-Arc : ‘she ! bought’ - Nivre’s Algorithm

she

bought

a

car

bought

a

car

she ! bought

S

I

A

• Initialize

• Shift : ‘she’

• Left-Arc : ‘she ! bought’ - Nivre’s Algorithm

she

bought

a

car

bought

a

car

she ! bought

S

I

A

• Initialize

• Shift : ‘she’

• Left-Arc : ‘she ! bought’

• Shift : ‘bought’ - Nivre’s Algorithm

she

bought

a

car

a

bought

car

she ! bought

S

I

A

• Initialize

• Shift : ‘she’

• Left-Arc : ‘she ! bought’

• Shift : ‘bought’ - Nivre’s Algorithm

she

bought

a

car

a

bought

car

she ! bought

S

I

A

• Initialize

• Shift : ‘a’

• Shift : ‘she’

• Left-Arc : ‘she ! bought’

• Shift : ‘bought’ - Nivre’s Algorithm

she

bought

a

car

a

bought

car

she ! bought

S

I

A

• Initialize

• Shift : ‘a’

• Shift : ‘she’

• Left-Arc : ‘she ! bought’

• Shift : ‘bought’ - Nivre’s Algorithm

she

bought

a

car

a

bought

car

she ! bought

S

I

A

• Initialize

• Shift : ‘a’

• Shift : ‘she’

• Left-Arc : ‘a ! car’

• Left-Arc : ‘she ! bought’

• Shift : ‘bought’ - Nivre’s Algorithm

she

bought

a

car

a

a ! car

bought

car

she ! bought

S

I

A

• Initialize

• Shift : ‘a’

• Shift : ‘she’

• Left-Arc : ‘a ! car’

• Left-Arc : ‘she ! bought’

• Shift : ‘bought’ - Nivre’s Algorithm

she

bought

a

car

a ! car

bought

car

she ! bought

S

I

A

• Initialize

• Shift : ‘a’

• Shift : ‘she’

• Left-Arc : ‘a ! car’

• Left-Arc : ‘she ! bought’

• Shift : ‘bought’ - Nivre’s Algorithm

she

bought

a

car

a ! car

bought

car

she ! bought

S

I

A

• Initialize

• Shift : ‘a’

• Shift : ‘she’

• Left-Arc : ‘a ! car’

• Left-Arc : ‘she ! bought’

• Right-Arc : ‘bought " car’

• Shift : ‘bought’ - Nivre’s Algorithm

she

bought

a

car

bought " car

a ! car

bought

car

she ! bought

S

I

A

• Initialize

• Shift : ‘a’

• Shift : ‘she’

• Left-Arc : ‘a ! car’

• Left-Arc : ‘she ! bought’

• Right-Arc : ‘bought " car’

• Shift : ‘bought’ - Nivre’s Algorithm

she

bought

a

car

bought " car

car

a ! car

bought

she ! bought

S

I

A

• Initialize

• Shift : ‘a’

• Shift : ‘she’

• Left-Arc : ‘a ! car’

• Left-Arc : ‘she ! bought’

• Right-Arc : ‘bought " car’

• Shift : ‘bought’ - Nivre’s Algorithm

she

bought

a

car

bought " car

car

a ! car

bought

she ! bought

S

I

A

• Initialize

• Shift : ‘a’

• Shift : ‘she’

• Left-Arc : ‘a ! car’

• Left-Arc : ‘she ! bought’

• Right-Arc : ‘bought " car’

• Shift : ‘bought’

• Terminate (no need to reduce ‘car’ or ‘bought’) - Edmonds’s Algorithm

• Based on Maximum Spanning Tree algorithm

• Algorithm

1. Build a complete graph

2. Keep only incoming edges with the maximum scores

3. If there is no cycle, goto #5

4. If there is a cycle, pretend the cycle as one vertex and update

scores for all incoming edges to the cycle; goto #2

5. Break all cycles by removing appropriate edges in the cycle

(edges that cause multiple heads) - Edmonds’s Algorithm

root

10

9

9

saw

20

0

30

30

John

11

Mary

3 - Edmonds’s Algorithm

root

root

10

9

9

saw

saw

20

0

20

30

30

30

30

John

11

Mary

John

Mary

3 - Edmonds’s Algorithm

root

root

10

9

9

saw

saw

20

0

20

30

30

30

30

John

11

Mary

John

Mary

3 - Edmonds’s Algorithm

root

root

10

9

9

saw

saw

20

0

20

30

30

30

30

John

11

Mary

John

Mary

3

root

40

9

29

saw

30

30

John

31

Mary

3 - Edmonds’s Algorithm

root

root

10

9

9

saw

saw

20

0

20

30

30

30

30

John

11

Mary

John

Mary

3

root

root

40

9

40

29

saw

saw

30

30

30

John

31

Mary

John

Mary

3 - Edmonds’s Algorithm

root

root

10

9

9

saw

saw

20

0

20

30

30

30

30

John

11

Mary

John

Mary

3

root

root

root

40

9

40

10

29

saw

saw

saw

30

30

30

30

30

John

31

Mary

John

Mary

John

Mary

3 - MaltParser vs. MSTParser

• Advantages

- MaltParser: low complexity, more accurate for short-distance

- MSTParser: high accuracy, more accurate for long-distance

• Merge MaltParser and MSTParser in learning stages - Choi’s Algorithm

• Projective dependency parsing algorithm

- Motivation: do more exhaustive searches than MaltParser but

keep the complexity lower than the one for MSTParser

- Intuition: in projective dependency graph, every word can find

its head from a word in adjacent phrases

She

bought

a

car

yesterday

that

was

blue

- Searching: starts with the edge-node, jump to its head

- Complexity: O(k"n), k is the number of words in each phrase - Choi’s Algorithm

0.9

0.6

A

B

C

D

E

X - Choi’s Algorithm

0.9

0.6

0.9

A

B

C

D

E

A

B

C

D

E

X

X - Choi’s Algorithm

0.9

0.6

0.9

A

B

C

D

E

A

B

C

D

E

X

X

0.9

X

A

B

C

D

E

0.5

0.7 - Choi’s Algorithm

0.9

0.6

0.9

A

B

C

D

E

A

B

C

D

E

X

X

0.9

X

0.7

0.9

A

B

C

D

E

A

B

C

D

E

0.5

0.7

X - Choi’s Algorithm

0.9

0.6

0.9

A

B

C

D

E

A

B

C

D

E

X

X

0.9

X

0.7

0.9

A

B

C

D

E

A

B

C

D

E

0.5

0.7

X

0.7

0.9

0.8

A

B

C

D

E

X

X - Choi’s Algorithm

0.9

0.6

0.9

A

B

C

D

E

A

B

C

D

E

X

X

0.9

X

0.7

0.9

A

B

C

D

E

A

B

C

D

E

0.5

0.7

X

0.7

0.7

0.9

0.8

0.9

0.8

A

B

C

D

E

A

B

C

D

E

X

0.5

X

0.8 - Applications

• Semantic Role Labeling

- CoNLL 2008~9 shared task

• Sentence Compression

- Relation extraction

• Sentence Alignment

- Paraphrase detection, machine translation

• Sentiment Analysis