このページは https://speakerdeck.com/bargava/introduction-to-deep-learning-for-image-processing の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

- 1) Motivation to Machine Learning/Deep Learning

i. Biological Motivation, Hierarchical/Representation Learning

2) Introduction to Artificial Neural Networks/Deep Learning

i. Neuron, Perceptron, Logistic, MLP, Rectified Linear Units

ii. Backpropagation Algorithm, Gradient Descent(including SGD), Mini-batch

3) Image Recognition: Convolution Neural Networks

i. Convolution

ii. Sub-sampling, Pooling

iii. Dropout

iv. Architecture

4) Chal enges in Deep Learning

i. Vanishing Gradients & Local Minima

ii. Overfitting - How do we recognize the digits?
- Model

8

Inputs

Computation

Outputs

source: http://www.slideshare.net/indicods/deep-learning-with-python-and-the-theano-library - How do we recognize the digits?

k Nearest-

Neighbors

Use functions that

compute relevant

information to

For each image,

solve the problem

find“most similar”

image. Guess that

as the label. - How do we recognize the digits?

Difficult to enumerate all possible interactions,

spatial structure, etc. as hand-coded features. - How is information detected?

How is it stored?

How does it influence recognition? - q Connected network of

neurons.

q Communicate by electric

and chemical signals

~ 1011 neurons

~1000 synapses per neuron

q Signals come in via dendrites into soma

q Signal goes out via axon to other neurons through

synapses - Kids talk grammatical y correct sentences

even before they are taught formal langauge.

Kids learn after listening to a lot of sentences

Associations and Structural inferences.

Understand context. Eg: Drinking water Vs

River Vs Ocean

See/hear/feel first. Assimilate.

Build the context hierarchically.

Recognize. Respond. - INPUT

Weights

OUTPUT

NEURON

ACTIVATION

FUNCTION - INPUT

OUTPUT

Individual elements are weak computational elements - source: https:/ en.wikipedia.org/wiki/Artificial_neural_network
- » The network is formed by

– Input layer of source

nodes

– One or several hidden

layers of processing neurons

– Output layer of processing

neuron(s)

» Connections only between

adjacent layers

» There are no feedback

connections

source: https:/ en.wikipedia.org/wiki/Artificial_neural_network - activation function of a node defines

the output of that node given input(s)

source: https:/ en.wikibooks.org/wiki/Artificial_Neural_Networks/Activation_Functions - qInvented in 1957.

qClassifies input data

into one of the

output classes.

qOnline learning

possible

If the weighted input is

more than the

threshold, classify as 1.

Else 0

A

A

source: ASDM Summer School on Deep Learning 2014

a - qOutput is bounded between 0 & 1

qSymmetric

qDomain: Complete set of Real

qDerivative can be quickly

numbers

calculated

qSmooth and continuous function.

qPositive, Bounded, strictly

positive - Generalization of logistic regression

for Multi-class classification - qCheap to compute (no

products/exponentials)

qFaster training

qSparser networks

qBounded below 0

qStrictly increasing - q Generalization of Rectified Linear

q Max of k linear functions -> piece-

wise linear

q At large k, can approximate a non-

linear function

source: ASDM Summer School on Deep Learning 2014 - Goal: To find minimum of the loss

function (minimize error of the model) - Computes gradient of the loss

function w.r. to the weights.

Backpropagate training error to

generate deltas of all the

neurons from hidden layers to

output layer

Use gradient descent to

update weights - qStochastic Gradient Descent: Instead of using al

of the training data, train iteratively on “mini-

batches”

qOnline Learning: Mini-batch size is 1. Weights

are adjusted for every single data point. - q Feedforward ANN

q Activation function:

Mostly sigmoid

q Improvement over

basic perceptron: Can

classify data that aren’t

linearly separable - source: http://www.cs.ubc.ca/~nando/
- Image Processing Technique to change intensities of a

pixel to reflect the intensities of the surrounding pixels.

Eg: image effects like blur, sharpen, and edge detection

source: https:/ developer.apple.com/library/ios/documentation/Performance/Conceptual/vImage/ConvolutionOperations/ConvolutionOperations.html - source: https:/ developer.apple.com/library/ios/documentation/Performance/Conceptual/vImage/ConvolutionOperations/ConvolutionOperations.html
- source: https:/ developer.apple.com/library/ios/documentation/Performance/Conceptual/vImage/ConvolutionOperations/ConvolutionOperations.html
- Hinton: The pooling operation used in convolutional neural networks

is a big mistake and the fact that it works so well is a disaster“ - Source: http:/ deeplearning.net
- A … model? - GoogLeNet

source: arXiv:1409.4842v1 [cs.CV] 17 Sep 2014 - source: ASDM Summer School on Deep Learning 2014
- source: ASDM Summer School on Deep Learning 2014
- • Smart weight initialization

• Use RL and Maxout units

• Train for a longer time .

• Use restarts . - Instead of learning al the network

weights jointly, build each layer

iteratively . - q Start learning the layer nearest to inputs.

q proceed adding layers until last hidden layer

q When adding a new layer, weights from previous layers are

kept fixed .

q One layer is learned at a time, vanishing gradients are

avoided . - Two options for output

q Keep pre–trained

q Discriminative fine–

weights fixed .

tuning . Optimize

q Deep network ≃

jointly over all

feature generator :

network weights

more useful features

(e.g. using BP).

are obtained from

q Pre–trained weights

the inputs.

seem to be a better

q The output layer is

choice than random

trained as a

initialization .

perceptron , inputs

q Slower Fine tuning

are the features

obtained. - Single feature

Divide feature space into 3 simple bins

Too simplistic –

Too much overlap between classes.

source:http://nikhilbuduma.com/2015/03/10/the-curse-of-dimensionality/ - But by adding features to improve

separability, the feature space explodes.

source:http://nikhilbuduma.com/2015/03/10/the-curse-of-dimensionality/ - Need: Automated feature selection

source:http://nikhilbuduma.com/2015/03/10/the-curse-of-dimensionality/ - GOAL: Efficient Reconstruction of Input Data

Use input as input and output of the

network and train (Don’t use the

target)

Learn the weights using a standard

neural network method (e.g.

backpropagation). - After learning,

Wenc : compresses the information in

the data into the hidden units.

Wdec decompresses the info in the

hidden units back to its original

form (with some loss). - Enforce sparsity .

(A network is sparse if only a few

of its units >0 simultaneously)

Introduce random noise in the

input patterns when training the

network. (Only in the inputs! ) - Train autoencoder to reconstruct the training data .

Proceed iteratively, building new autoencoders reconstructing

the value of the hidden units of the previous stage.

Create a feedforward network using the encoding weights

Wenc from al the trained autoencoders. As output layer, use

the training labels .

Fine–tuning : train the last layer or the full network using

standard backpropagation. - Underfitted

Good

Overfitted

Model

Model

Model

source:http://mathbabe.org/2012/11/20/columbia-data-science-course-week-12-predictive-modeling-data-leakage-model-evaluation/ - •

Weight Decay

•

L1/L2 Regularization

• Suitable Model Architectures (depth and width of the layers)

• Unsupervised Pre-training

•

Dropout

•

Data Augmentation - • Cripple the network by removing hidden units stochastical y

• In practice, probability of 0.5 works wel .

BEFORE

DROPOUT

source:http://info.usherbrooke.ca/hlarochelle/neural_networks/content.html - • Cripple the network by removing hidden units stochastical y

• In practice, probability of 0.5 works wel .

AFTER

BEFORE

DROPOUT

DROPOUT

source:http://info.usherbrooke.ca/hlarochelle/neural_networks/content.html - Some ways to augment data:

• Rotation: random angle between 0° and 360°

• Translation: random shift between -10 and 10 pixels

• Rescaling: random scale factor between 1/1.6 and 1.6

• Flipping: yes or no

• Shearing: random angle between -20° and 20°

• Stretching: random with stretch factor between 1/1.3 and 1.3 - Compared to CPUs 20x speedups are typical

Source: http://www.nvidia.com/object/what-is-gpu-computing.html - q Accelerated computations on float32 data

q Matrix multiplication, convolution, and large element-wise

operations can be accelerated a lot (5-50x)

q Difficult to paral elize dense neural networks on multiple GPU

efficiently (Active area of research)

q Convolutional neural networks – unlike dense neural networks –

can be run very efficiently on multiple GPUs. Their use of

weight sharing makes data paral elism very efficient

q Copying of large quantities of data to and from a device is

relatively slow.

q CUDA has released cuDNN. - Source: NIPS 2013 Tutorial