このページは http://www.slideshare.net/datasciencekorea/5-20141107deeplearning の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

1年以上前 (2015/04/20)にアップロードin学び

한국데이터사이언스학회(koreadatascience.org) 2014 학술대회 발표자료

2014.11.7

세션1-3

“Deep Learning - 인공지능 기계학습의 ...

한국데이터사이언스학회(koreadatascience.org) 2014 학술대회 발표자료

2014.11.7

세션1-3

“Deep Learning - 인공지능 기계학습의 새로운 트랜드”

- 김인중 교수(한동대학교)

- Handong Global University

Deep Learning:

인공지능/기계학습의 새로운 트랜드

In-Jung Kim

Handong Global University

2014. 11. 7. - Agenda

Introduction to Deep Learning

Deep Learning Algorithms

Successful Application of Deep Learning

Q&A - Machine Learning

Learn from data

Data-driven approach ( Knowledge-based approach)

Trainable

Framework

Parameters Train

Training

Samples

(deep) neural networks,

Bayesian classifier,

HMM, MRF, CRF, SVM,

Etc.

Input

result - Knowledge-base vs. Data-driven

Knowledge-based approaches

Intuitive

Dependent on designer’s knowledge

Difficult to justify or improve

Data-driven approaches

Learn from data

Requires training data

Given training data, easy to (re)build

Difficult to understand trained model

Given sufficient training samples,

Data-driven approach > knowledge-based approach - Neural Networks

An artificial neural network is a mathematical

model inspired by biological neural networks.

Intelligence comes from their connection weights

Connection weights are decided by learning or adaptation

x1

x2

xn

…

o1

om

…

o - Neural Networks

Neural networks is a mathematical model to learn

mappings

Mapping from a vector to another vector (or a scalar value)

Examples)

Pattern class (classification)

Independent variables dependent variables (regression)

Information decision

History future

x1

x2

xn

… o1

om

…

input

vector

output

vector - Neural Networks

Neural networks can learn probability distribution from

training samples

Examples)

Approximate joint prob. P(X,Y), or conditional prob. P(X|Y)

Likelihood P(x| ), a posteriori probability P( |x)

Classification, sampling, restoration

x1

x2

xn

…

o1

om

…

training

Samples from f(X)

{ X1, X2, … } - Deep Neural Networks

A deep neural network (DNN) is a neural network with

multiple levels of nonlinear operations.

Layer 1

Input

Layer 2

Output

… - Network Depth and Decision Region

[Lipman87]

Half Plane

Bounded

by

Hyperplane

Convex

(open or

closed)

Regions

Arbitrary

(Complexity

Limited By

# Nodes) - Why Deep Networks?

Efficient in modeling of complex functions

Representation of some functions needs sufficiently many

layers.

Stepwise abstraction to learn high-level feature

Large capacity

DNN can learn very well from a huge volume of samples

Integrated learning

DNN integrates feature extractor and classifier in a single

network - Stepwise Abstraction

Abstraction from low level representation to high

level representation.

Similar to human perception process

Layer 1

Input

Layer 2

Output

…

[Lee12] - Integrated Learning

Deep networks optimize both feature extractor and

classifier in a unified framework.

Conventional system

Deep neural network

Classifier

Feature

Extractor

DNN

input

input

output

output - Challenges with Deep Networks

Hard to optimize

Back-propagation algorithm does not work well for deep

fully connected networks starting from random weights

New training algorithms

A large number of parameters

A huge volume of training samples is now available.

Techniques to improve generalization ability

Ex) sparse coding, virtual sample generation, dropout

Requires heavy computation

GPU-based massive parallel processing

H/W implementation (SoC, FPGA) - The Back-Propagation Algorithm

Gradient descent algorithm to minimize error E.

Layer 1 (X1)

Input layer (X0)

W1

Layer 2 (X2)

W2

Output layer (XN)

WN

feature

errorsignal

…

netj

𝜕𝐸

𝜕𝑤 𝑖𝑗

=

𝜕𝐸

𝜕𝑛𝑒𝑡 𝑗

𝜕𝑛𝑒𝑡 𝑗

𝜕𝑤 𝑖𝑗

=𝛿𝑗 𝑥𝑖

𝛿𝑗 =

𝜕𝐸

𝜕𝑛𝑒𝑡𝑗

𝛿𝑖 = 𝑓′

(𝑛𝑒𝑡𝑖)

𝑗

𝑤𝑖𝑗 𝛿𝑗 - BP on Deep Network

BP does not work on deep networks

Error signals from many nodes are blended together.

become dim and vague on bottom layers

“Diminishing gradient problem”

Error signal

at a non-output node i

𝛿𝑖 = 𝑓′(𝑛𝑒𝑡𝑖)

𝑗

𝑤𝑖𝑗 𝛿𝑗

j

wij

i - Agenda

Introduction to Deep Learning

Deep Learning Algorithms

Successful Application of Deep Learning

Q&A - Breakthroughs in Deep Learning

Conventional back-propagation algorithm does not

work well for deep fully-connected networks starting

from random weights.

Layer-wise unsupervised pre-training algorithm

Ex) DBN[Hinton2006], stacked auto-encoders[Bengio2006]

First, place the weights near a local optimal position by

unsupervised learning algorithm

Then, conventional supervised learning algorithms work fine

Network structure to prevent diminishing gradient

problem

Ex) Convolutional Neural Networks [Fukushima1980][LeCun1998] - Layer-wise Unsupervised Pre-training

Based on generative neural networks

Training procedure

1. Pre-train each layer to reproduce the input by unsupervised

learning algorithm

2. Fine-tune the whole network by a supervised learning

algorithm

Ex) wake-sleep[Hinton2003], back-propagation - Generative Neural Networks

Neural networks with forward–backward connections

Layer 1 (X1)

Input layer (X0)

W1

Layer 2 (X2)

W2

Output layer (XN)

WN

…

Layer 1 (X1)

Input layer (X0)

Layer 2 (X2)

Output layer (XN)

W1

W2

WN

…

WN

W1

W2

Forward-backward networkFeed forward network

Forward connection

for “encoding”

Backward connection

for “decoding” - Layer-wise Unsupervised Pre-training

Starting from bottom layer, train each layer to

reproduce the input

Input encoding hidden decoding reprod. of input

Layer 1 (X1)

Input layer (X0)

Layer 2 (X2)

Output layer (XN)

W1

W2

WN

…

WN

W1

W2

Forward-backward network

Forward propagation

for encoding

Backward propagation

for decoding

1st phase - Layer-wise Unsupervised Pre-training

Starting from bottom layer, train each layer to

reproduce the input

Input encoding hidden decoding reprod. of input

Layer 1 (X1)

Input layer (X0)

Layer 2 (X2)

Output layer (XN)

W1

W2

WN

…

WN

W1

W2

Forward-backward network

Forward propagation

for encoding

Backward propagation

for decoding

2nd phase - Convolutional Neural Networks

Neocognitron [Fukushima80]

Designed to imitate visual processing of human/animals

Suggested basic concept and network structure of CNN

LeNet [LeCun98]

Simplified node and network structure

Gradient-based learning

Many improvements and extensions

[Simard2003], [Ciresan2011]

Convolutional DBN [Lee2009]

Siamese network [Chopra2005]

Locally connected network [Taigman2014]

Fast training algorithm using FFT [Mathieu2014] - Convolutional Neural Networks

Composed of many heterogeneous layers

Convolution layer – feature extraction

Max-pooling layer - feature abstraction

Fully-connected layers - classification - Convolution Layers

Odd-numbered layers in

low/middle-level of CNN

Nodes on each layer are grouped

into 2D planes (or feature maps)

Each plane is connected to one or

more input planes

Each node computes weighted sum

of input nodes in a small region

All nodes on a plane share weight

set

Extract feature by convolution

operation - Max-Pooling Layers

Even-numbered layers in

low/middle-level of CNN

Nodes on each layer are grouped

into planes

Each plane is connected to only

one input plane

Each node chooses maximum

among input nodes in a small

region

Abstract features

Reduces feature dimension

Ignores positional variation of feature

elements - Fully-connected Layers

Top 2~3 layers of CNN

1D structure

Each node is fully connected to

all input nodes

Each node computes weighted

sum of all input nodes

Classify input pattern with high-

level features extracted by previous

layers - Gradient-based Learning [LeCun98]

Trains the whole network to minimize a single error

function E.

At layer n

At layer n-1

Layer 1 (X1)

Input layer (X0)

W1

Layer 2 (X2)

W2

Output layer (XN)

WN

… - Why CNN Works Well?

Network structure effectively guides learning from 2D

images preventing the diminishing gradient problem

Sparse connection

Parameter tying

Good at catching 2D structures

Training of convolution masks is effective to learn feature

extraction

Good at handing shape variation

Abstraction in phases

Max pooling

Directly train the network to minimize classification error. - Agenda

Introduction to Deep Learning

Deep Learning Algorithms

Successful Application of Deep Learning

Q&A - Numeral Digit Recognition (MNIST DB)

Support Vector Machines

Neural Nets

Convolutional Neural Networks (Deep Networks) - Chinese Character Recognition (CASIA DB)

ICDAR 2013 Competition Result [Yin, et.al 2013]

CNN - Object Image Recognition

ImageNet Large Scale Visual Recognition Challenge

2013 (ILSVRC2013, http://www.image-net.org)

1300 object categories

Training set: 1,281,167 images

Validation set: 50,000 images

Test set: 100,000 images - Examples of ILSVRC2013 Images
- ILSVRC2013 Results

All high rankers are based on CNNs - Deep Learning in Face Recognition

Face recognition flow

1. Detection

2. Alignment (pre-processing)

3. Representation (feature extraction) CNN

Robust to variation in lighting, expression, …

Alternatives: LBP + PCA/FDA

4. Verification / Classification Siamese network

Alternatives: Euclidian distance, dot product, 2 distance, SVM,

… - DeepFace [Taigman2014]

Facebook AI group, Tel Aviv Univ.

Y. Taigman, et.al.

Achieved 97.25% on LFW dataset.

cf. Conventional best performance: 96.33% [Cao 2013]

Face recognition prodedure

2D and 3D alignment

CNN-based representation

Verification by weighted 2 distance and Siamese network

A huge volume of training data

SFC dataset (4,000 identity * 1,000 samples) - DeepFace [Taigman2014]

Feature extraction by CNN

Train a CNN-based face recognizer

Represent the input face image by the output of (N-1)th

layer - Deep Learning in Speech Recognition

Hybrid system (HMM + DNN) - Deep Learning in Speech Recognition

Deep Neural Networks for Acoustic Modeling in

Speech Recognition [Hinton2012]

Deep learning - Deep Learning in Speech Recognition

CNN for speech recognition [Ossama13]

Apply CNN on 2D vector (frame, frequency bands) - Hangul Recognition

Challenges in Hangul Recognition

A multitude of similar characters

Missing one small stroke often result in misclassification

Ex) 에-애–얘, 괟-괱-괠-팰, 흥-홍-훙-흉

Excessive cursiveness - Deep Learning in Hangul Recognition

Methods SERI95a PE92

Kim&Kim01

Structural

matching

86.3% 82.2%

Kang&Kim04

Structural

matching

90.3% 87.7%

Jang&Kim02

Structural

matching +

Post-processing

93.4% N/A

Kim&Liu11 MQDF 93.71% 85.99%

Kim CNN 95.96% 92.92%

Error reduction rates 35.71% 42.44%

[Kim2014] - Q&A