このページは http://www.slideshare.net/yuzurukato/neural-turing-machines-43179669 の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

2年弱前 (2015/01/04)にアップロードinテクノロジー

This is a brief summary of the paper

"Neural Turing Machines"

written by

A. Graves

G. Wayne

I. Da...

This is a brief summary of the paper

"Neural Turing Machines"

written by

A. Graves

G. Wayne

I. Danihelka

Google DeepMind, London UK

- A summary of

Neural Turing Machines(NTM) - This is a brief summary of the paper

“Neural Turing Machines”

http://arxiv.org/abs/1410.5401

Written by

A. Graves

G. Wayne

I. Danihelka

Google DeepMind, London UK - “Neural Turing Machines” are, in a single phrase,

Neural Networks having the capability of

coupling to external memories.

The combined system is analogous to

a Turing Machine. - Introduction
- Neural Network

・Neural Network(NN) learns from large amount of observational data.

(data is a tuple of [External Input, External Output])

Neural Network - Recurrent Neural Network

・Recurrent Neural Network(RNN) introduces directed circles to NN,

which work as a sort of internal memories.

(Current states are determined by previous states and External Input)

Recurrent Neural Network

Directed circle - Neural Turing Machine

・”Neural Turing Machine” is NN which has the capability

of coupling to the external memories.

(Controller is NN with parameters for coupling to external memories)

External Memory - How to access external memories

・ Read/Write heads use weights to access external memory.

・ Weights are determined by the parameters on controller.

・ Parameters are learned from large amount of external I/O data.

External Memory

N ×M matrix

N locations for

M size vector

External Input

External output

M

Read head

Controller

weighted N

(NN with parameters

access

for adjusting weights)

Write head

e: to erase vectors

a: to add new vectors - How to update weight

Content Addressing：

Weight adjustment based on the content on the each location.

Interpolation:

Determines how much we use previous weight state.

Convolutional Shift and Sharping :

Weight adjustment based on the location of the memory. - Application
- Copy
- Result of copy algorithm

・ NTM learns some form of copy algorithm.

・ NTM performs better than LSTM(a kind of RNN).

・ Even NTM copy algorithm makes some mistakes

for long length data(as indicated by the red arrow).

NTM

・ Outputs are supposed to be a copy of targets. - Result of copy algorithm

・ NTM learns some form of copy algorithm.

・ NTM performs better than LSTM(a kind of RNN).

・ Even NTM copy algorithm makes some mistakes

for long length data(as indicated by the red arrow).

LSTM

・ Outputs are supposed to be a copy of targets. - How NTM uses an external memory for copy algorithm

・ All weight focus on a single location.

・ Read locations exactly match the write locations.

External

Inputs/Outputs

Adds/Reads

Vectors to

Memory

Write/Read

Weightings - Repeat Copy
- How NTM uses an external memory for repeat copy algorithm

・ All weights focus on a single location.

・ Read locations are repeatedly referred by the write head. - Result of repeat copy algorithm

・ NTMs learns some form of repeated copy algorithm. - Associative Recall
- Results of associate recall algorithm

・ NTM correctly produces the red box item

after they see the green box item. - Dynamic N-Grams

(Predicts the next bit from

N previous bits) - Results of Dynamical N-grams

・ NTM predicts the next bit almost as well as Optimal estimator.

Optimal:(N1, N0 is the number of 1,0 seen in the previous c bits) - Priority Sort
- Results of Priority Sort

・Write head writes to locations according to a linear function of priority

・Read head reads from locations in increasing order. - Conclusion
- Conclusion

・”Neural Turing Machines” are, in a single phrase, Neural Networks

having the capability of coupling to external memories.

・ We see the capability of using external memories through the

application of copy, repeat copy, associative recall, dynamical N-grams,

Priority sort.

・ I refer the readers who are really interested in this summary to

the original paper(http://arxiv.org/abs/1410.5401).