このページは http://www.slideshare.net/jiessiecao/parsing-natural-scenes-and-natural-language-with-recursive-neural-networks の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

- Parsing Natural Scenes and

Natural Language with

Recursive Neural Networks

Richard Socher, Cliff Chiung-Yu Lin, Andrew Y. Ng, Christopher D. Manning

ICML’ 2011

Jie Cao - Outline

• Context

• Recursive Neural Network Deﬁnition

• Input Representation

• Output

• Greedy Structure Predicting RNNs

• Loss Function

• Max-Margin Framework

• Back propagation Through Structure

• L-BFGS

• Experiment and Improved RNN - Recursive vs Recurrent NN
- f: X→Y (Input X )
- Map Phrase into Vector

Space - Word Embedding Matrix

dense vector

co-occurrence statistic

Collobert, R. and Weston, J. A uniﬁed architecture for natural language processing: deep neural networks with multitask learning. In ICML, 2008 - Input Representation for

Scene Image

the features

each segment i = 1,...,Nsegs in an image

the matrix of parameters

we want to learn

bias

applied element-wise,

can be any sigmoid-like function，original one

“semantic”

n-dimensional space

78 segments per image

119 features for every segement

Gould, S., Fulton, R., and Koller, D. Decomposing a Scene into Geometric and Semantically Consistent Regions. In ICCV, 2009 - f: X→Y (Output Y )

• For Visual Parser:

• A visual tree is correct if all adjacent segments that belong to the

same class(all segments labeled) are merged into one super segment

before merges occur with super segments of different classes.

• how object parts are internally merged or how complete, neighboring

objects are merged into the full scene image

• A set of correct trees

• For Language Parser:

• only has one element, the annotated ground truth tree: Y (x) = {y}

How to evaluate to error between Y_true and Y’?

(Loss Function) - Recursive NN Deﬁnition

new presentation of parent(i,j)

new score of parent(i,j)

C recursively adding

new merged parent,

and update the adjacent matrix

Potential Adjacent Pairs - Greedy Structure Predicting

RNNs - Greedy Structure Predicting

RNNs - Greedy Structure Predicting

RNNs - Parsing a sentence
- Category Classify in RNN

Each node of the tree built by the RNN has associated

with it a distributed feature representation

We can leverage this representation by adding to

each RNN parent node (after removing the scoring layer)

a simple softmax layer to predict class labels - Loss Function for Language

For Constituency Parser:(Phrase Structure Parser)

A constituent(non-terminal) is correct only if :

1. it dominates exactly the correct span of words

2. it is the correct type of constituent

(S[1:7]

(NP[1:1] Jim)

(VP[2:2] ate)

(NP[3:4] the cookies)

(PP[5:7] in

(NP[6:7] the bowl)

)

)

(S[1:7]

(NP[1:1] Jim)

(VP[2:7] ate

(NP[3:7] the cookies

(PP[5:7] in

(NP[6:7] the bowl)

)

)

)

)

Hamming Distance - Loss Function for Image

For Visual Parser: A set of correct trees

for proposing a parse yˆ for input x with labels l - RNN for Structure Prediction

Given the training set, we search for a function f

with small expected loss on unseen inputs.

T(x) is the set of possibly correct trees.

Assuming this problem can be described in terms of a

computationally tractable max over a score function s

How to deﬁne the margin? - Max Margin

Hard-Margin:

Soft-Margin:

Adding a slack to handle not separable data

We need to minimize as the hinge loss

max for true Y is because not only one true tree for image

Max - Max-Margin Framework
- Backpropagation

Through Structure - cho
- Experiment in ICML’2011

The ﬁnal unlabeled bracketing F-measure of our language

parser is 90.29%, compared to 91.63% for the widely

used Berkeley parser (Petrov et al., 2006) (development

F1 is virtually identical with 92.06% for the RNN and

92.08% for the Berkeley parser).

Unlike most previous systems, our parser does not provide

a parent with information about the syntactic categories of

its children. This shows that our learned, continuous

representations capture enough syntactic information to

make good parsing decisions. - Experiment
- Allow different W for different

pairs syntactic categories - Thanks