- Image Recognition Bargava Subramanian (@bargava) Data Scientist, Cisco Systems
1) Motivation to Machine Learning/Deep Learning i. Biological Motivation, Hierarchical/Representation Learning 2) Introduction to Artificial Neural Networks/Deep Learning i. Neuron, Perceptron, Logistic, MLP, Rectified Linear Units ii. Backpropagation Algorithm, Gradient Descent(including SGD), Mini-batch 3) Image Recognition: Convolution Neural Networks i. Convolution ii. Sub-sampling, Pooling iii. Dropout iv. Architecture 4) Chal enges in Deep Learning i. Vanishing Gradients & Local Minima ii. Overfitting
How do we recognize the digits? k Nearest- Neighbors Use functions that compute relevant information to For each image, solve the problem find“most similar” image. Guess that as the label.
How do we recognize the digits? Difficult to enumerate all possible interactions, spatial structure, etc. as hand-coded features.
How is information detected? How is it stored? How does it influence recognition?
q Connected network of neurons. q Communicate by electric and chemical signals ~ 1011 neurons ~1000 synapses per neuron q Signals come in via dendrites into soma q Signal goes out via axon to other neurons through synapses
Kids talk grammatical y correct sentences even before they are taught formal langauge. Kids learn after listening to a lot of sentences Associations and Structural inferences. Understand context. Eg: Drinking water Vs River Vs Ocean See/hear/feel first. Assimilate. Build the context hierarchically. Recognize. Respond.
INPUT Weights OUTPUT NEURON ACTIVATION FUNCTION
INPUT OUTPUT Individual elements are weak computational elements
» The network is formed by – Input layer of source nodes – One or several hidden layers of processing neurons – Output layer of processing neuron(s) » Connections only between adjacent layers » There are no feedback connections source: https:/ en.wikipedia.org/wiki/Artificial_neural_network
activation function of a node defines the output of that node given input(s) source: https:/ en.wikibooks.org/wiki/Artificial_Neural_Networks/Activation_Functions
qInvented in 1957. qClassifies input data into one of the output classes. qOnline learning possible If the weighted input is more than the threshold, classify as 1. Else 0 A A source: ASDM Summer School on Deep Learning 2014 a
qOutput is bounded between 0 & 1 qSymmetric qDomain: Complete set of Real qDerivative can be quickly numbers calculated qSmooth and continuous function. qPositive, Bounded, strictly positive
Generalization of logistic regression for Multi-class classification
qCheap to compute (no products/exponentials) qFaster training qSparser networks qBounded below 0 qStrictly increasing
q Generalization of Rectified Linear q Max of k linear functions -> piece- wise linear q At large k, can approximate a non- linear function source: ASDM Summer School on Deep Learning 2014
Goal: To find minimum of the loss function (minimize error of the model)
Image Processing Technique to change intensities of a pixel to reflect the intensities of the surrounding pixels. Eg: image effects like blur, sharpen, and edge detection source: https:/ developer.apple.com/library/ios/documentation/Performance/Conceptual/vImage/ConvolutionOperations/ConvolutionOperations.html
• Smart weight initialization • Use RL and Maxout units • Train for a longer time . • Use restarts .
Instead of learning al the network weights jointly, build each layer iteratively .
q Start learning the layer nearest to inputs. q proceed adding layers until last hidden layer q When adding a new layer, weights from previous layers are kept fixed . q One layer is learned at a time, vanishing gradients are avoided .
Two options for output q Keep pre–trained q Discriminative fine– weights fixed . tuning . Optimize q Deep network ≃ jointly over all feature generator : network weights more useful features (e.g. using BP). are obtained from q Pre–trained weights the inputs. seem to be a better q The output layer is choice than random trained as a initialization . perceptron , inputs q Slower Fine tuning are the features obtained.
GOAL: Efficient Reconstruction of Input Data Use input as input and output of the network and train (Don’t use the target) Learn the weights using a standard neural network method (e.g. backpropagation).
After learning, Wenc : compresses the information in the data into the hidden units. Wdec decompresses the info in the hidden units back to its original form (with some loss).
Enforce sparsity . (A network is sparse if only a few of its units >0 simultaneously) Introduce random noise in the input patterns when training the network. (Only in the inputs! )
Train autoencoder to reconstruct the training data . Proceed iteratively, building new autoencoders reconstructing the value of the hidden units of the previous stage. Create a feedforward network using the encoding weights Wenc from al the trained autoencoders. As output layer, use the training labels . Fine–tuning : train the last layer or the full network using standard backpropagation.
Some ways to augment data: • Rotation: random angle between 0° and 360° • Translation: random shift between -10 and 10 pixels • Rescaling: random scale factor between 1/1.6 and 1.6 • Flipping: yes or no • Shearing: random angle between -20° and 20° • Stretching: random with stretch factor between 1/1.3 and 1.3
q Accelerated computations on float32 data q Matrix multiplication, convolution, and large element-wise operations can be accelerated a lot (5-50x) q Difficult to paral elize dense neural networks on multiple GPU efficiently (Active area of research) q Convolutional neural networks – unlike dense neural networks – can be run very efficiently on multiple GPUs. Their use of weight sharing makes data paral elism very efficient q Copying of large quantities of data to and from a device is relatively slow. q CUDA has released cuDNN.