このページは http://www.slideshare.net/jbhuang/how-to-come-up-with-new-research-ideas-4005840 の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

6年以上前 (2010/05/07)にアップロードin学び

Computer vision has been studied for more than 40 years. Due to the increasingly diverse and rapi...

Computer vision has been studied for more than 40 years. Due to the increasingly diverse and rapidly developed topics in vision and the related fields (e.g., machine learning, signal processing, cognitive science), the tasks to come up with new research ideas are usually daunting for junior graduate students in this field. In this talk, I will present five methods to come up with new research ideas. For each method, I will give several examples (i.e., existing works in the literature) to illustrate how the method works in practice.

This is a common sense talk and will not have complicated math equations and theories.

Note: The content of this talk is inspired by "Raskar Idea Hexagon" - Prof. Ramesh Raskar’s talk on "How to come up with new Ideas".

To download the presentation slide with videos, please visit

http://jbhuang0604.blogspot.com/2010/05/how-to-come-up-with-new-research-ideas.html

For the video lecture (in Chinese), please visit

http://jbhuang0604.blogspot.com/2010/06/blog-post_14.html

- How to Come Up With New

Research Ideas?

Jia-Bin Huang

jbhuang0604@gmail.com

Taiwan

May , 2010

1 / 94 - What this talk is about?

Five approaches to come up with new ideas in computer vision.

Extensive case studies (i.e., more than one hundred papers).

A common sense talk. No complicate theories or equations.

I wish someone told me this before.

Reference

The content of this talk is greatly inspired by “Raskar Idea

Hexagon".

2 - What this talk is about?

Five approaches to come up with new ideas in computer vision.

Extensive case studies (i.e., more than one hundred papers).

A common sense talk. No complicate theories or equations.

I wish someone told me this before.

Reference

The content of this talk is greatly inspired by “Raskar Idea

Hexagon".

2 - What this talk is about?

Five approaches to come up with new ideas in computer vision.

Extensive case studies (i.e., more than one hundred papers).

A common sense talk. No complicate theories or equations.

I wish someone told me this before.

Reference

The content of this talk is greatly inspired by “Raskar Idea

Hexagon".

2 - Outline

1

Introduction

2

Five ways to come up with new ideas

Seek different dimensions

neXt = Xd

Combine two or more topics

neXt = X + Y

Re-think the research directions

neXt = ¯

X

Use powerful tools, find suitable problems

neXt = X ↑

Add an appropriate adjective

neXt = Adj + X

3

What is a bad idea?

3 - Outline

1

Introduction

2

Five ways to come up with new ideas

Seek different dimensions

neXt = Xd

Combine two or more topics

neXt = X + Y

Re-think the research directions

neXt = ¯

X

Use powerful tools, find suitable problems

neXt = X ↑

Add an appropriate adjective

neXt = Adj + X

3

What is a bad idea?

4 - Active Topics in Computer Vision

[Szeliski Computer Vision: Algorithms and Applications 2010]

Digital image processing

Blocks world, line labeling

Generalized cylinders

Pictorial structures

Stereo correspondence

Intrinsic images

Optical flow

Structure from motion

Image pyramids

Scale-space processing

Shape from X

Physically-based modeling

Regularization

Markov Random Fields

Kalman filters

3D range data processing

Projective invariants

Factorization

Physics-based vision

Graph cuts

Particle filtering

Energy-based segmentation

Face recognition and detection

Subspace methods

Image-based modeling/rendering

Texture synthesis/inpainting

Computational photography

Feature-based recognition

MRF inference algorithms

Learning

5 - What can we learn from the past?

The topics are diverse and evolve over time.

The ways to come up with new ideas are similar. There are

patterns to follow.

6 - Outline

1

Introduction

2

Five ways to come up with new ideas

Seek different dimensions

neXt = Xd

Combine two or more topics

neXt = X + Y

Re-think the research directions

neXt = ¯

X

Use powerful tools, find suitable problems

neXt = X ↑

Add an appropriate adjective

neXt = Adj + X

3

What is a bad idea?

7 - Outline

1

Introduction

2

Five ways to come up with new ideas

Seek different dimensions

neXt = Xd

Combine two or more topics

neXt = X + Y

Re-think the research directions

neXt = ¯

X

Use powerful tools, find suitable problems

neXt = X ↑

Add an appropriate adjective

neXt = Adj + X

3

What is a bad idea?

8 - Seek different dimensions

neXt = Xd

The only difference between a rut

and a grave is their dimensions. -

Ellen Glasgow

9 - Seek different dimensions

neXt = Xd

Idea

Can we increase/replace/transform the dimensions of the original

problem to get new problems/solutions?

What kind of dimensions can we work on?

1

Concrete dimensions (e.g., space, time, frequency)

2

Abstract dimensions (e.g., properties)

10 - EX 1-1. Content-Aware Media Resizing

[Avidan et al. SIGGRAPH 07] [Rubinstein et al. SIGGRAPH 08]

Ideas

Extend dimensions from 2D image to 3D video: image re-targeting

⇒ video re-targeting

Other dimensions? E.g., 4D light field, infrared image, range

image.

11 - EX 1-2. Video Stitching

[Rav-Acha et al. CVPR 05]

Input video

Dynamic Panorama

Ideas

Extend dimensions from image to video, i.e., Image Panorama ⇒

Video Mosaics with Non-Chronological Time

Increase the time dimension in both input and output

12 - EX 1-3. Multi-Image Fusion

[Agarwala et al. SIGGRAPH 04]

Ideas

Extend from single input image to multiple input images ⇒ Digital

Photomontage

Increase the dimension in input only.

13 - EX 1-4. Computation Photography (Coded

Photography)

[Raskar et al. SIGGRAPH 04, 06, 08] [Levin et al. SIGGRAPH 07]

Ideas

Coded Photography: reversibly encode information about the

scene in a single photograph

Coding in Time (Exposure), Coded Illumination, Coding in Space

(aperture), and Coded Wavelength

Replace the dimension to code information of the light field

14 - EX 1-1. Photography in Low Light Conditions

Flash

Blurred

Noisy

What we can do ?

Flash → Changes the overall scene appearance (cold and gray)

Long exposure time (hand shake) → Blurred image

Short exposure time (insufficient light) → Noisy image

15 - EX 1-1-1. Flash/non-Flash Photography

[Petschnigg et al. SIGGRAPH 2004]

Flash

No flash

Detail transfer with denoising

Ideas

The original problem (taking a good photo in low light

environments from single image) is difficult.

Increase the dimension of input (flash/no-flash image pair) make

the problem much easier.

16 - EX 1-1-2. Image Deblurring with Blurred/Noisy Image

Pairs

[Yuan et al. SIGGRAPH 2007]

Blurred

Noisy

Enhanced noisy

Deblurred result

Ideas

The original problem (taking a good photo in low light and flash

prohibited environments from single image) is difficult.

Increase the dimension of input (Blurred/Noisy image pair) make

the problem much easier.

17 - EX 1-1-3. Robust Flash Deblurring

[Zhou et al. CVPR 2010]

Ideas

The original problem (taking a good photo in low light

environments from single image) is difficult.

Increase the dimension of input (Blurred/Flash image pair) make

the problem much easier.

18 - EX 1-1-4. Dark Flash Photography

[Krishnan et al. SIGGRAPH 2009]

Ideas

The original problem (taking a good photo in low light

environments from single image) is difficult.

Increase the dimension of input (Dark Flash/Noisy image pair)

make the problem much easier.

19 - EX 1-2. Brute-Force Vision

[Hays and Efros SIGGRAPH 07] [Dale et al. ICCV 09] [Agarwal et al. ICCV 09]

[Furukawa et al. ICCV 09]

Ideas

Utilize a large collection of photos.

20 - EX 2-1. X Alignment/Registration (pixel, object, scene)

[Liu et al. CVPR 08, ECCV 08] [Berg et al. CVPR 05]

21 - EX 2-2. Shape from X (shading, texture, specular)

[Lobay and Forsyth IJCV 06] [Fleming et al JOV 04] [Adato et al ICCV 07]

shading

specular

texture

specular flow

22 - EX 2-3. Depth from X (stereo, (de-)focus, coded

aperture, diffusion, occlusion, semantic label)

[Levin et al. SIGGRAPH 07] [Hoiem et al. ICCV 07] [Liu et al. CVPR 10] [Zhou et al.

CVPR 10]

Coded Aperture

Semantic Labels

Occlusion

Diffusion

23 - EX 2-4. Infer X from a single image (geometric,

geography, illumination)

[Hoiem et al. ICCV 05] [Hays and Efros CVPR 08] [Lalonde et al. ICCV 09]

Geometric

Geography

Illumination

24 - Outline

1

Introduction

2

Five ways to come up with new ideas

Seek different dimensions

neXt = Xd

Combine two or more topics

neXt = X + Y

Re-think the research directions

neXt = ¯

X

Use powerful tools, find suitable problems

neXt = X ↑

Add an appropriate adjective

neXt = Adj + X

3

What is a bad idea?

25 - Combine two or more topics

neXt = X + Y

To steal ideas from one person is

plagiarism. To steal from many is

research. - Wilson Mizner

26 - Combine two or more topics

neXt = X + Y

Idea

Can we combine two or more topics to get new problems or

solutions?

What kind of topics can we combine?

1

X, Y are methods

2

X, Y are problems

3

X, Y are areas

27 - EX 1-1. Viola-Jones Object Detection Framework

[Viola and Jones CVPR 2001]

Simple feature

Integral img

Boosting

Cascade structure

Ideas

Paper title: Rapid Object Detection using a Boosted Cascade of

Simple Features

Viola-Jones object detection framework = Integral Images (simple

feature)(1984) + AdaBoost(1997) + Cascade Architecture(long

time ago)

28 - EX 1-2. SIFT Flow = SIFT + Optical Flow

[Liu et al. ECCV 08 CVPR 09]

Motion hallucination

Label transfer

Ideas

Dense sampling in time : optical flow :: dense sampling in world

images : SIFT flow

29 - EX 1-3. Visual Tracking with Online Multiple Instance

Boosting

[Babenko et al. CVPR 09]

Ideas

MILTrack = Multiple Instance Boosting (2005) + Online Boosting

Tracking (2006)

30 - EX 2-1. High Dynamic Range Image Reconstruction

from Hand-held Cameras

[Lu et al. CVPR 2009]

Ideas

HDR from from Hand-held Cameras = High Dynamic Range

Image Reconstruction + Image Deblurring

31 - EX 2-2. Human Body Understanding

[Guan et al. ICCV 09]

Ideas

Human Body Understanding = Shape Reconstruction + Pose

Estimation

32 - EX 2-3. Image Understanding

detection, tracking, recognition, segmentation, reconstruction, scene classification,

event recognition

33 - EX 2-3-1. Detection + Tracking

[Andriluka et al. CVPR 08]

Ideas

People detection and people tracking are highly correlated

problems.

Combine two problems can potentially achieve improved

performance on individual tasks.

34 - EX 2-3-2. Object Attribute + Recognition

[Farhadi et al. CVPR 09] [Lampert et al. CVPR 09]

Ideas

Describe image by attributes

Enable knowledge transfer to recognition class with no visual

examples

35 - EX 2-3-2. Object Recognition + Detection

[Yeh et al. CVPR 09]

Ideas

Concurrent object localization and recognition

36 - EX 2-3-3. Image Segmentation + Object Recognition

+ Event Recognition

[Li et al. CVPR 09]

Ideas

Combine scene classification, image segmentation, image

annotation

All three tasks are mutually beneficial

37 - EX 3-1. SixthSense - A Wearable Gestural Interface

[Mistry and Maes TED 2009]

Ideas

SixthSense = Computer Vision (e.g., tracking, recognition) +

Internet

38 - EX 3-2. Sikuli:Picture-driven computing

[Yeh et al. UIST 09] [Chang et al. CHI 10]

Ideas

1. Readability/usability, 2. GUI serialization, 3. Computer vision

on computer-generated figures

39 - Outline

1

Introduction

2

Five ways to come up with new ideas

Seek different dimensions

neXt = Xd

Combine two or more topics

neXt = X + Y

Re-think the research directions

neXt = ¯

X

Use powerful tools, find suitable problems

neXt = X ↑

Add an appropriate adjective

neXt = Adj + X

3

What is a bad idea?

40 - Re-think the research directions

neXt = ¯

X

If at first, the idea is not absurd, then

there is no hope for it -

Albert Einstein

41 - Re-think the research directions

neXt = ¯

X

Ideas

Are the current research directions really make sense? What’s the

key problem?

What could we do?

1

Re-formulate the original problem.

2

Analyze, compare existing approaches. Provide insight to the

problems.

42 - EX 1-1. Beyond Sliding Windows

[Lampert et al. CVPR 08]

Rectangle set

Branch and bound search

Ideas

Sliding window search ⇔ brand-and-bound search

Represent a set of rectangles with 4 intervals

Use brand-and-bound to find the optimal rectangle (object

localization) efficiently

43 - EX 1-2. Beyond Categories

[Malisiewicz and Efros CVPR 08, NIPS 09]

Ideas

Explicit categorization ⇔ Implicit categorization

Ask "what is this like?" (association), instead of "what is it?"

(categorization)

44 - EX 1-3. Motion-Invariant Photography

[Levin et al. SIGGRAPH 08] [Cho et al. ICCP 10]

Ideas

Still camera ⇔ Moving camera (parabolic exposures)

Enable the use of spatial-invariant blur kernel estimation

45 - EX 1-4. Super-resolution from Single Image

[Glasner et al. ICCV 09]

Ideas

Clasical multi-image SR/Example-based SR ⇔ Single SR

framework

46 - EX 2-1. In Defense of ...

[Boiman et al. CVPR 08] [Hartley PAMI 97]

Nearest-Neighbor Based Image Classification

Quantization of local image descriptors (used to generate

"bags-of-words", codebooks).

Computation of "Image-to-Image" distance, instead of

"Image-to-Class" distance

The performance ranks among the top leading learning-based

image classifiers

The 8-point Algorithm for the fundamental matrix

Normalization, Normalization, Normalization!

Performs almost as well as the best iterative algorithm

47 - EX 2-2. Understanding blind deconvolution

[Levin et al. CVPR 2009]

Ideas

Blind deconvolution: recover sharp image x from the blurred one

(y = k ⊗ x + n).

MAPx,k estimation often favors no-blur explanations.

MAPk can be accurately estimated since the kernel size is often

smaller than the image size.

Blind deconvolution should be address in this way: MAPk +

non-blind deconvolution.

48 - EX 2-3. Understanding camera trade-offs

[Levin et al. ECCV 08]

Ideas

Traditional optics evaluation: 2D image sharpness (eg, Modulation

Transfer Function)

Modern camera evaluation: How well does the recorded data

allow us to estimate the visual world - the lightfield?

49 - EX 2-4. What is a good image segment?

[Bagon et al. ECCV 08]

Ideas

Good image segment as one which can be easily composed using

its own pieces, but is difficult to compose using pieces from other

parts of the image

50 - EX 2-5. Lambertian Reflectance and Linear

Subspaces

[Basri and Jacobs PAMI 03]

Ideas

The set of all Lambertian reflectance functions (the mapping from

surface normals to intensities) obtained with arbitrary distant light

sources lies close to a 9D linear subspace.

Explain prior empirical results using linear subspace methods.

51 - Outline

1

Introduction

2

Five ways to come up with new ideas

Seek different dimensions

neXt = Xd

Combine two or more topics

neXt = X + Y

Re-think the research directions

neXt = ¯

X

Use powerful tools, find suitable problems

neXt = X ↑

Add an appropriate adjective

neXt = Adj + X

3

What is a bad idea?

52 - Use powerful tools, find suitable problems neXt = X ↑

If the only tool you have is a hammer,

you tend to see every problem as a

nail. - Abraham Maslow

53 - Use powerful tools, find suitable problems neXt = X ↑

What kinds of tools should we understand?

Calculus of Variations

Dimensionality Reduction

Spectral Methods (specifically, spectral clustering)

Probabilistic Graphical Model

Structured Prediction

Bilateral Filtering

Sparse Representation

and more ... spectral method/theory, information theory, (convex)

optimization, etc

54 - EX 1. Calculus of Variations (1/2)

From Calculus to Calculus of Variations

Calculus

Calculus of Variations

Functions

Functionals (functions of functions)

f:

n

R → R

f: F → R, f (u) = x2 L(x, u(x), u (x))dx

x1

Derivative df (x)

Variation df (u)

dx

du

lim

f (x+∆x)−f (x)

f (u+ δx)−f (u) ∂

∆x→0

lim

f (x + ∆u)|

∆x

→0

∂

=0

Local extremum

Local extremum

df (x) = 0

Euler-Lagrange equation

dx

Total Variation (TV)

TV(y) =

x1 |y |dx: The "oscillation strength" of y(x)

x0

55 - EX 1. Calculus of Variations (2/2)

Total Variation Denoising/Inpainting

Applications in computer vision

Optical flow [Horn and Schunck AI 81]

Shape from shading [Horn and Brooks CVGIP 86]

Edge detection [PAMI 87]

Anisotropic diffusion [Perona and Malik PAMI 90]

Active contours model [Kass et al. IJCV 98]

Image segmentation [Morel and Solimini 95]

Image restoration [Aubert and Vese SIAM Journal on NA 97]

56 - EX 1. Calculus of Variations (2/2)

Total Variation Denoising/Inpainting

Applications in computer vision

Optical flow [Horn and Schunck AI 81]

Shape from shading [Horn and Brooks CVGIP 86]

Edge detection [PAMI 87]

Anisotropic diffusion [Perona and Malik PAMI 90]

Active contours model [Kass et al. IJCV 98]

Image segmentation [Morel and Solimini 95]

Image restoration [Aubert and Vese SIAM Journal on NA 97]

56 - EX 2. Dimensionality Reduction (1/2)

Why we need dimensionality reduction?

Since high-dimensional data is everywhere (e.g., images, human gene

distributions, weather prediction), we need dimensionality reduction for

1

processing data efficiently.

2

estimating the distributions of data accuratly (curse of

dimensionality)

3

finding meaningful representation of data

Classification of dimensionality reduction methods

Global structure preserved

Local structure preserved

Linear

PCA, LDA

LPP, NPE

Nonlinear

ISOMAP, Kernel PCA, DM

LLE, LE, HE

57 - EX 2. Dimensionality Reduction (1/2)

Why we need dimensionality reduction?

Since high-dimensional data is everywhere (e.g., images, human gene

distributions, weather prediction), we need dimensionality reduction for

1

processing data efficiently.

2

estimating the distributions of data accuratly (curse of

dimensionality)

3

finding meaningful representation of data

Classification of dimensionality reduction methods

Global structure preserved

Local structure preserved

Linear

PCA, LDA

LPP, NPE

Nonlinear

ISOMAP, Kernel PCA, DM

LLE, LE, HE

57 - EX 2. Dimensionality Reduction (2/2)

Applications in computer vision

Subspace as constraints

Structure from motion [Tomasi and Kanade IJCV 92], Optical flow

[Irani IJCV 02], Layer extraction [Ke and Kanade CVPR 01], Face

alignment [Saragih et al. ICCV 09]

Face recognition (e.g., PCA, LDA, LPP)

PCA [Turk and Pentland PAMI 91], LDA [Belhumeur et al. PAMI 97],

LPP [He et al. PAMI 05], Random [Wright et al. PAMI 09]

Motion segmentation

subspace separation [Kanatani ICCV 01] [Yan and Pollefeys ECCV

06] [Rao et al. CVPR 08] [Lauer and Schnorr ICCV 09]

Lighting

linear subspace [Belhumeur and Kriegman IJCV 98] [Georghiades

et al. PAMI 01] [Lee et al. PAMI 05] [Basri and Jacobs PAMI 02]

Visual tracking

incremental subspace learning [Ross et al. IJCV 08] [Li et al. CVPR

08]

58 - EX 2. Dimensionality Reduction (2/2)

Applications in computer vision

Subspace as constraints

Structure from motion [Tomasi and Kanade IJCV 92], Optical flow

[Irani IJCV 02], Layer extraction [Ke and Kanade CVPR 01], Face

alignment [Saragih et al. ICCV 09]

Face recognition (e.g., PCA, LDA, LPP)

PCA [Turk and Pentland PAMI 91], LDA [Belhumeur et al. PAMI 97],

LPP [He et al. PAMI 05], Random [Wright et al. PAMI 09]

Motion segmentation

subspace separation [Kanatani ICCV 01] [Yan and Pollefeys ECCV

06] [Rao et al. CVPR 08] [Lauer and Schnorr ICCV 09]

Lighting

linear subspace [Belhumeur and Kriegman IJCV 98] [Georghiades

et al. PAMI 01] [Lee et al. PAMI 05] [Basri and Jacobs PAMI 02]

Visual tracking

incremental subspace learning [Ross et al. IJCV 08] [Li et al. CVPR

08]

58 - EX 2. Dimensionality Reduction (2/2)

Applications in computer vision

Subspace as constraints

Structure from motion [Tomasi and Kanade IJCV 92], Optical flow

[Irani IJCV 02], Layer extraction [Ke and Kanade CVPR 01], Face

alignment [Saragih et al. ICCV 09]

Face recognition (e.g., PCA, LDA, LPP)

PCA [Turk and Pentland PAMI 91], LDA [Belhumeur et al. PAMI 97],

LPP [He et al. PAMI 05], Random [Wright et al. PAMI 09]

Motion segmentation

subspace separation [Kanatani ICCV 01] [Yan and Pollefeys ECCV

06] [Rao et al. CVPR 08] [Lauer and Schnorr ICCV 09]

Lighting

linear subspace [Belhumeur and Kriegman IJCV 98] [Georghiades

et al. PAMI 01] [Lee et al. PAMI 05] [Basri and Jacobs PAMI 02]

Visual tracking

incremental subspace learning [Ross et al. IJCV 08] [Li et al. CVPR

08]

58

Applications in computer vision

Subspace as constraints

Structure from motion [Tomasi and Kanade IJCV 92], Optical flow

[Irani IJCV 02], Layer extraction [Ke and Kanade CVPR 01], Face

alignment [Saragih et al. ICCV 09]

Face recognition (e.g., PCA, LDA, LPP)

PCA [Turk and Pentland PAMI 91], LDA [Belhumeur et al. PAMI 97],

LPP [He et al. PAMI 05], Random [Wright et al. PAMI 09]

Motion segmentation

subspace separation [Kanatani ICCV 01] [Yan and Pollefeys ECCV

06] [Rao et al. CVPR 08] [Lauer and Schnorr ICCV 09]

Lighting

linear subspace [Belhumeur and Kriegman IJCV 98] [Georghiades

et al. PAMI 01] [Lee et al. PAMI 05] [Basri and Jacobs PAMI 02]

Visual tracking

incremental subspace learning [Ross et al. IJCV 08] [Li et al. CVPR

08]

58

Applications in computer vision

Subspace as constraints

Structure from motion [Tomasi and Kanade IJCV 92], Optical flow

[Irani IJCV 02], Layer extraction [Ke and Kanade CVPR 01], Face

alignment [Saragih et al. ICCV 09]

Face recognition (e.g., PCA, LDA, LPP)

PCA [Turk and Pentland PAMI 91], LDA [Belhumeur et al. PAMI 97],

LPP [He et al. PAMI 05], Random [Wright et al. PAMI 09]

Motion segmentation

subspace separation [Kanatani ICCV 01] [Yan and Pollefeys ECCV

06] [Rao et al. CVPR 08] [Lauer and Schnorr ICCV 09]

Lighting

linear subspace [Belhumeur and Kriegman IJCV 98] [Georghiades

et al. PAMI 01] [Lee et al. PAMI 05] [Basri and Jacobs PAMI 02]

Visual tracking

incremental subspace learning [Ross et al. IJCV 08] [Li et al. CVPR

08]

58- EX 3. Spectral Clustering (1/3)

Why spectral clustering is popular?

Can be solved efficiently by standard linear algebra software

Very often outperform traditional clustering algorithms

Spectral clustering algorithm

Input: a set of data points

1

Construct a similarity graph, e.g., -neighbor, k-nearest neighbor,

fully connected

2

Construct graph Laplacian, e.g., (un)normalized (L, Lrw, Lsym)

3

Compute the first k (with smallest eigenvalues) eigenvectors of L,

v1, · · · , vk

4

Let V ∈ Rn×k be a matrix contains v1, ·, vk as columns

5

Cluster the row vectors yi with the k-means algorithm into cluster

C1, · · · , Ck

Output: Clusters A1, · · · , Ak with Ai = {j|yj ∈ Ci}

59 - EX 3. Spectral Clustering (1/3)

Why spectral clustering is popular?

Can be solved efficiently by standard linear algebra software

Very often outperform traditional clustering algorithms

Spectral clustering algorithm

Input: a set of data points

1

Construct a similarity graph, e.g., -neighbor, k-nearest neighbor,

fully connected

2

Construct graph Laplacian, e.g., (un)normalized (L, Lrw, Lsym)

3

Compute the first k (with smallest eigenvalues) eigenvectors of L,

v1, · · · , vk

4

Let V ∈ Rn×k be a matrix contains v1, ·, vk as columns

5

Cluster the row vectors yi with the k-means algorithm into cluster

C1, · · · , Ck

Output: Clusters A1, · · · , Ak with Ai = {j|yj ∈ Ci}

59 - EX 3. Spectral Clustering (1/3)

Why spectral clustering is popular?

Can be solved efficiently by standard linear algebra software

Very often outperform traditional clustering algorithms

Spectral clustering algorithm

Input: a set of data points

1

Construct a similarity graph, e.g., -neighbor, k-nearest neighbor,

fully connected

2

Construct graph Laplacian, e.g., (un)normalized (L, Lrw, Lsym)

3

Compute the first k (with smallest eigenvalues) eigenvectors of L,

v1, · · · , vk

4

Let V ∈ Rn×k be a matrix contains v1, ·, vk as columns

5

Cluster the row vectors yi with the k-means algorithm into cluster

C1, · · · , Ck

Output: Clusters A1, · · · , Ak with Ai = {j|yj ∈ Ci}

59

Why spectral clustering is popular?

Can be solved efficiently by standard linear algebra software

Very often outperform traditional clustering algorithms

Spectral clustering algorithm

Input: a set of data points

1

Construct a similarity graph, e.g., -neighbor, k-nearest neighbor,

fully connected

2

Construct graph Laplacian, e.g., (un)normalized (L, Lrw, Lsym)

3

Compute the first k (with smallest eigenvalues) eigenvectors of L,

v1, · · · , vk

4

Let V ∈ Rn×k be a matrix contains v1, ·, vk as columns

5

Cluster the row vectors yi with the k-means algorithm into cluster

C1, · · · , Ck

Output: Clusters A1, · · · , Ak with Ai = {j|yj ∈ Ci}

59

Why spectral clustering is popular?

Can be solved efficiently by standard linear algebra software

Very often outperform traditional clustering algorithms

Spectral clustering algorithm

Input: a set of data points

1

Construct a similarity graph, e.g., -neighbor, k-nearest neighbor,

fully connected

2

Construct graph Laplacian, e.g., (un)normalized (L, Lrw, Lsym)

3

Compute the first k (with smallest eigenvalues) eigenvectors of L,

v1, · · · , vk

4

Let V ∈ Rn×k be a matrix contains v1, ·, vk as columns

5

Cluster the row vectors yi with the k-means algorithm into cluster

C1, · · · , Ck

Output: Clusters A1, · · · , Ak with Ai = {j|yj ∈ Ci}

59

Why spectral clustering is popular?

Can be solved efficiently by standard linear algebra software

Very often outperform traditional clustering algorithms

Spectral clustering algorithm

Input: a set of data points

1

Construct a similarity graph, e.g., -neighbor, k-nearest neighbor,

fully connected

2

Construct graph Laplacian, e.g., (un)normalized (L, Lrw, Lsym)

3

Compute the first k (with smallest eigenvalues) eigenvectors of L,

v1, · · · , vk

4

Let V ∈ Rn×k be a matrix contains v1, ·, vk as columns

5

Cluster the row vectors yi with the k-means algorithm into cluster

C1, · · · , Ck

Output: Clusters A1, · · · , Ak with Ai = {j|yj ∈ Ci}

59

Why spectral clustering is popular?

Can be solved efficiently by standard linear algebra software

Very often outperform traditional clustering algorithms

Spectral clustering algorithm

Input: a set of data points

1

Construct a similarity graph, e.g., -neighbor, k-nearest neighbor,

fully connected

2

Construct graph Laplacian, e.g., (un)normalized (L, Lrw, Lsym)

3

Compute the first k (with smallest eigenvalues) eigenvectors of L,

v1, · · · , vk

4

Let V ∈ Rn×k be a matrix contains v1, ·, vk as columns

5

Cluster the row vectors yi with the k-means algorithm into cluster

C1, · · · , Ck

Output: Clusters A1, · · · , Ak with Ai = {j|yj ∈ Ci}

59- EX 3. Spectral Clustering (2/3)

Why it works?

Graph Cut Point of View: Construct a partition that minimize the

weight across the cut (the well-known mincut problem) while

balancing the clusters (e.g., RatioCut, Normalized cut).

Random Walks Point of View: When minimizing Ncut, we

actually look for a cut through the graph such that a random walk

seldom transitions from one cluster to another.

Perturbation Theory Point of View: The distance between

eigenvectors from the ideal and nearly ideal graph Laplacian is

bounded by a constant times a norm of the error matrix. If the

perturbations are not small enough, then the k-means algorithm

will still separate the groups from each other.

60 - EX 3. Spectral Clustering (2/3)

Why it works?

Graph Cut Point of View: Construct a partition that minimize the

weight across the cut (the well-known mincut problem) while

balancing the clusters (e.g., RatioCut, Normalized cut).

Random Walks Point of View: When minimizing Ncut, we

actually look for a cut through the graph such that a random walk

seldom transitions from one cluster to another.

Perturbation Theory Point of View: The distance between

eigenvectors from the ideal and nearly ideal graph Laplacian is

bounded by a constant times a norm of the error matrix. If the

perturbations are not small enough, then the k-means algorithm

will still separate the groups from each other.

60 - EX 3. Spectral Clustering (2/3)

Why it works?

Graph Cut Point of View: Construct a partition that minimize the

weight across the cut (the well-known mincut problem) while

balancing the clusters (e.g., RatioCut, Normalized cut).

Random Walks Point of View: When minimizing Ncut, we

actually look for a cut through the graph such that a random walk

seldom transitions from one cluster to another.

Perturbation Theory Point of View: The distance between

eigenvectors from the ideal and nearly ideal graph Laplacian is

bounded by a constant times a norm of the error matrix. If the

perturbations are not small enough, then the k-means algorithm

will still separate the groups from each other.

60 - EX 3. Spectral Clustering (3/3)

[Shi and Malik PAMI 02]

Eigenvectors carry contour information.

61 - EX 4. Probabilistic Graphical Model (1/2)

What is probabilistic graphical models?

A marriage between probability theory and graph theory.

A natural tool for dealing with uncertainty and complexity

Provides a way to view all probablistic systems (e.g., mixture

models, factor analysis, hidden Markov models, Kalman filters and

Ising models) as instances of a common underlying formalism.

62 - EX 4. Probabilistic Graphical Model (2/2)

63 - EX 5. Structured Prediction (1/2)

What is structured prediction?

Structured prediction is a framework for solving problems of

classification or regression in which the output variables are

mutually dependent or constrained.

Lots of examples

Natural language parsing

Machine translation

Object segmentation

Gene prediction

Protein alignment

Numerous tasks in computational linguistics, speech, vision,

biology.

64 - EX 5. Structured Prediction (1/2)

What is structured prediction?

Structured prediction is a framework for solving problems of

classification or regression in which the output variables are

mutually dependent or constrained.

Lots of examples

Natural language parsing

Machine translation

Object segmentation

Gene prediction

Protein alignment

Numerous tasks in computational linguistics, speech, vision,

biology.

64 - EX 5. Structured Prediction (2/2)

Applications [Lampert et al. ECCV 08] [Desai et al. ICCV 09]

65 - EX 6. Bilateral Filtering (1/3)

What’s Bilateral Filtering?

A technique to smooth images while preserving edges

Ubiquitous in image processing, computational photography

66 - EX 6. Bilateral Filtering (2/3)

[Bennett and McMillan SIGGRAPH 05] [Eisemann and Durand SIGGRAPH 04] [Jones

et al. SIGGRAPH 03] [WinnemÂ¨oller et al. SIGGRAPH 06] [Bae et al. SIGGRAPH 02]

67 - EX 6. Bilateral Filtering (3/3)

How does bilateral filter relate with other methods?

Intepretation

Bilateral filter is equivalent to mode filtering in local histograms

Bilateral filter can be interpreted in term of robust statistics since it

is related to a cost function

Bilateral filter is a discretization of a particular kind of a

PDE-based anisotropic diffusion

68 - EX 6. Bilateral Filtering (3/3)

How does bilateral filter relate with other methods?

Intepretation

Bilateral filter is equivalent to mode filtering in local histograms

Bilateral filter can be interpreted in term of robust statistics since it

is related to a cost function

Bilateral filter is a discretization of a particular kind of a

PDE-based anisotropic diffusion

68 - EX 7. Sparse Representation (1/4)

Ideas

Natural signals (e.g. audio, image) usually admit sparse

representation (i.e., can be well represented by a linear

combination of a few atom signals)

Successfully applied to various areas in signal/image precessing,

vision and graphics.

69 - EX 7. Sparse Representation (2/4)

Image Restoration [Aharon et al. TSP 06] [Julien et al. TIP 08]

denoising

Inpainting

Demoisaic

Inpainting

70 - EX 7. Sparse Representation (3/4)

Classification [Wright et al. PAMI 09] [Julien et al. CVPR ECCV NIPS 08]

face recognition

edge detection

texture classification

pixel classification

71 - EX 7. Sparse Representation (4/4)

Compressive sensing [donoho TIT 06] [Candes and Tao TIT 05 06]

and more (e.g., low-rank matrix completion, robust PCA)

72 - Outline

1

Introduction

2

Five ways to come up with new ideas

Seek different dimensions

neXt = Xd

Combine two or more topics

neXt = X + Y

Re-think the research directions

neXt = ¯

X

Use powerful tools, find suitable problems

neXt = X ↑

Add an appropriate adjective

neXt = Adj + X

3

What is a bad idea?

73 - Add an appropriate adjective

neXt = Adj + X

There is only one religion, though

there are a hundred versions of it. -

George Bernard Shaw

74 - Add an appropriate adjective

neXt = Adj + X

What kinds of adjective can we use?

linear ⇔ non-linear

generative/reconstructive ⇔ discriminative

rule-based / hand-designed ⇔ leanring-based

single scale ⇔ multi-scale

signle step ⇔ progressive

batch processing ⇔ incremental / online processing

fixed ⇔ adaptive / dynamic to data

parametric ⇔ non-parametric

Z - invariant (Z = translation / scale / rotation / noise, facial

expression / pose / lighting / occlusion)

Z - aware (Z = motion / content / semantic / context / occlusion)

75 - Add an appropriate adjective

neXt = Adj + X

What kinds of adjective can we use?

linear ⇔ non-linear

generative/reconstructive ⇔ discriminative

rule-based / hand-designed ⇔ leanring-based

single scale ⇔ multi-scale

signle step ⇔ progressive

batch processing ⇔ incremental / online processing

fixed ⇔ adaptive / dynamic to data

parametric ⇔ non-parametric

Z - invariant (Z = translation / scale / rotation / noise, facial

expression / pose / lighting / occlusion)

Z - aware (Z = motion / content / semantic / context / occlusion)

75 - Add an appropriate adjective

neXt = Adj + X

What kinds of adjective can we use?

linear ⇔ non-linear

generative/reconstructive ⇔ discriminative

rule-based / hand-designed ⇔ leanring-based

single scale ⇔ multi-scale

signle step ⇔ progressive

batch processing ⇔ incremental / online processing

fixed ⇔ adaptive / dynamic to data

parametric ⇔ non-parametric

Z - invariant (Z = translation / scale / rotation / noise, facial

expression / pose / lighting / occlusion)

Z - aware (Z = motion / content / semantic / context / occlusion)

75

neXt = Adj + X

What kinds of adjective can we use?

linear ⇔ non-linear

generative/reconstructive ⇔ discriminative

rule-based / hand-designed ⇔ leanring-based

single scale ⇔ multi-scale

signle step ⇔ progressive

batch processing ⇔ incremental / online processing

fixed ⇔ adaptive / dynamic to data

parametric ⇔ non-parametric

Z - invariant (Z = translation / scale / rotation / noise, facial

expression / pose / lighting / occlusion)

Z - aware (Z = motion / content / semantic / context / occlusion)

75

neXt = Adj + X

What kinds of adjective can we use?

linear ⇔ non-linear

generative/reconstructive ⇔ discriminative

rule-based / hand-designed ⇔ leanring-based

single scale ⇔ multi-scale

signle step ⇔ progressive

batch processing ⇔ incremental / online processing

fixed ⇔ adaptive / dynamic to data

parametric ⇔ non-parametric

Z - invariant (Z = translation / scale / rotation / noise, facial

expression / pose / lighting / occlusion)

Z - aware (Z = motion / content / semantic / context / occlusion)

75

neXt = Adj + X

What kinds of adjective can we use?

linear ⇔ non-linear

generative/reconstructive ⇔ discriminative

rule-based / hand-designed ⇔ leanring-based

single scale ⇔ multi-scale

signle step ⇔ progressive

batch processing ⇔ incremental / online processing

fixed ⇔ adaptive / dynamic to data

parametric ⇔ non-parametric

Z - invariant (Z = translation / scale / rotation / noise, facial

expression / pose / lighting / occlusion)

Z - aware (Z = motion / content / semantic / context / occlusion)

75

neXt = Adj + X

What kinds of adjective can we use?

linear ⇔ non-linear

generative/reconstructive ⇔ discriminative

rule-based / hand-designed ⇔ leanring-based

single scale ⇔ multi-scale

signle step ⇔ progressive

batch processing ⇔ incremental / online processing

fixed ⇔ adaptive / dynamic to data

parametric ⇔ non-parametric

Z - invariant (Z = translation / scale / rotation / noise, facial

expression / pose / lighting / occlusion)

Z - aware (Z = motion / content / semantic / context / occlusion)

75

neXt = Adj + X

What kinds of adjective can we use?

linear ⇔ non-linear

generative/reconstructive ⇔ discriminative

rule-based / hand-designed ⇔ leanring-based

single scale ⇔ multi-scale

signle step ⇔ progressive

batch processing ⇔ incremental / online processing

fixed ⇔ adaptive / dynamic to data

parametric ⇔ non-parametric

Z - invariant (Z = translation / scale / rotation / noise, facial

expression / pose / lighting / occlusion)

Z - aware (Z = motion / content / semantic / context / occlusion)

75

neXt = Adj + X

What kinds of adjective can we use?

linear ⇔ non-linear

generative/reconstructive ⇔ discriminative

rule-based / hand-designed ⇔ leanring-based

single scale ⇔ multi-scale

signle step ⇔ progressive

batch processing ⇔ incremental / online processing

fixed ⇔ adaptive / dynamic to data

parametric ⇔ non-parametric

Z - invariant (Z = translation / scale / rotation / noise, facial

expression / pose / lighting / occlusion)

Z - aware (Z = motion / content / semantic / context / occlusion)

75

neXt = Adj + X

What kinds of adjective can we use?

linear ⇔ non-linear

generative/reconstructive ⇔ discriminative

rule-based / hand-designed ⇔ leanring-based

single scale ⇔ multi-scale

signle step ⇔ progressive

batch processing ⇔ incremental / online processing

fixed ⇔ adaptive / dynamic to data

parametric ⇔ non-parametric

Z - invariant (Z = translation / scale / rotation / noise, facial

expression / pose / lighting / occlusion)

Z - aware (Z = motion / content / semantic / context / occlusion)

75- EX 1. Linear ⇔ Non-linear

Hard to find a straingt line to seperate them into two cluster?

Ideas

Linear methods may not capture the nonlinear structure in the

original data representation

Nonlinear methods

Kernel tricks (e.g., Kernel PCA, Kernel LDA, Kernel SVM, etc)

Manifold learning (e.g., ISOMAP, LLE, Laplacian eigenmap, etc)

76 - EX 1. Linear ⇔ Non-linear

Hard to find a straingt line to seperate them into two cluster?

Ideas

Linear methods may not capture the nonlinear structure in the

original data representation

Nonlinear methods

Kernel tricks (e.g., Kernel PCA, Kernel LDA, Kernel SVM, etc)

Manifold learning (e.g., ISOMAP, LLE, Laplacian eigenmap, etc)

76 - EX 2. Generative ⇔ Discriminative

Classification task : X → Y

Generative classifier estimate class-conditional pdfs P(X|Y) and

prior probabilities P(Y)

Naive Bayes, Mixtures of Gaussians, Mixtures of experts, Hidden

Markov Models (HMM), Sigmoidal belief networks, Bayesian

networks, Markov random fields (MRF)

Discriminative classifier estimate posterior probabilities P(Y|X)

Logistic regression, SVMs, Traditional neural networks, Nearest

neighbor, Conditional Random Fields (CRF)

Bayes’ rule

P(X|Y)P(Y)

P(Y|X) =

P(X)

Two different perspectives in viewing a problem

77 - EX 2. Generative ⇔ Discriminative

Classification task : X → Y

Generative classifier estimate class-conditional pdfs P(X|Y) and

prior probabilities P(Y)

Naive Bayes, Mixtures of Gaussians, Mixtures of experts, Hidden

Markov Models (HMM), Sigmoidal belief networks, Bayesian

networks, Markov random fields (MRF)

Discriminative classifier estimate posterior probabilities P(Y|X)

Logistic regression, SVMs, Traditional neural networks, Nearest

neighbor, Conditional Random Fields (CRF)

Bayes’ rule

P(X|Y)P(Y)

P(Y|X) =

P(X)

Two different perspectives in viewing a problem

77 - EX 2. Generative ⇔ Discriminative

Classification task : X → Y

Generative classifier estimate class-conditional pdfs P(X|Y) and

prior probabilities P(Y)

Naive Bayes, Mixtures of Gaussians, Mixtures of experts, Hidden

Markov Models (HMM), Sigmoidal belief networks, Bayesian

networks, Markov random fields (MRF)

Discriminative classifier estimate posterior probabilities P(Y|X)

Logistic regression, SVMs, Traditional neural networks, Nearest

neighbor, Conditional Random Fields (CRF)

Bayes’ rule

P(X|Y)P(Y)

P(Y|X) =

P(X)

Two different perspectives in viewing a problem

77 - EX 3. Rule-based / Hand-designed ⇔ Leanring-based

Hard to find rules to recognize digits?

Ideas

It may be difficult to design a set of rule to do certain task such as

handwritten digit recognition

Turn to machine learning methods instead

78 - EX 4. Single scale ⇔ Multi-scale

[Zelnik-Manor and Perona NIPS 04]

Ideas

We live in a multi-scale world (atom ↔ universe)

Image pyraimds / scale-space theory / wavelet representation →

all attempt to capture the multi-scale properties in signal/images.

79 - EX 5. Single step ⇔ Progressive

[Yuan et al. SIGGRAPH 08]

Ideas

Some problems are difficult to solve in one step → solve it

progressively

80 - EX 6. Batch processing ⇔ Incremental / Online

processing

Ideas

Online methods can handle potentially infinite data samples and

time-varied data

Examples

PCA → Incremental PCA (many variants)

LDA → Incremental LDA (many variants)

SVM → Incremental and decremental SVM [Cauwenberghs and

Poggio NIPS 01]

Dictionary learning (e.g., K-SVD) [Aharon and Elad TSP 06] →

Online dictionary learning [Mairal et al. ICML/JMLR 09]

AdaBoosting → Online boosting [Grabner and Bischof CVPR 06]

Multiple instance boosting → Online multiple instance boosting

[Babenko et al. CVPR 09]

81 - EX 6. Batch processing ⇔ Incremental / Online

processing

Ideas

Online methods can handle potentially infinite data samples and

time-varied data

Examples

PCA → Incremental PCA (many variants)

LDA → Incremental LDA (many variants)

SVM → Incremental and decremental SVM [Cauwenberghs and

Poggio NIPS 01]

Dictionary learning (e.g., K-SVD) [Aharon and Elad TSP 06] →

Online dictionary learning [Mairal et al. ICML/JMLR 09]

AdaBoosting → Online boosting [Grabner and Bischof CVPR 06]

Multiple instance boosting → Online multiple instance boosting

[Babenko et al. CVPR 09]

81 - EX 6. Batch processing ⇔ Incremental / Online

processing

Ideas

Online methods can handle potentially infinite data samples and

time-varied data

Examples

PCA → Incremental PCA (many variants)

LDA → Incremental LDA (many variants)

SVM → Incremental and decremental SVM [Cauwenberghs and

Poggio NIPS 01]

Dictionary learning (e.g., K-SVD) [Aharon and Elad TSP 06] →

Online dictionary learning [Mairal et al. ICML/JMLR 09]

AdaBoosting → Online boosting [Grabner and Bischof CVPR 06]

Multiple instance boosting → Online multiple instance boosting

[Babenko et al. CVPR 09]

81

processing

Ideas

Online methods can handle potentially infinite data samples and

time-varied data

Examples

PCA → Incremental PCA (many variants)

LDA → Incremental LDA (many variants)

SVM → Incremental and decremental SVM [Cauwenberghs and

Poggio NIPS 01]

Dictionary learning (e.g., K-SVD) [Aharon and Elad TSP 06] →

Online dictionary learning [Mairal et al. ICML/JMLR 09]

AdaBoosting → Online boosting [Grabner and Bischof CVPR 06]

Multiple instance boosting → Online multiple instance boosting

[Babenko et al. CVPR 09]

81

processing

Ideas

Online methods can handle potentially infinite data samples and

time-varied data

Examples

PCA → Incremental PCA (many variants)

LDA → Incremental LDA (many variants)

SVM → Incremental and decremental SVM [Cauwenberghs and

Poggio NIPS 01]

Dictionary learning (e.g., K-SVD) [Aharon and Elad TSP 06] →

Online dictionary learning [Mairal et al. ICML/JMLR 09]

AdaBoosting → Online boosting [Grabner and Bischof CVPR 06]

Multiple instance boosting → Online multiple instance boosting

[Babenko et al. CVPR 09]

81

processing

Ideas

Online methods can handle potentially infinite data samples and

time-varied data

Examples

PCA → Incremental PCA (many variants)

LDA → Incremental LDA (many variants)

SVM → Incremental and decremental SVM [Cauwenberghs and

Poggio NIPS 01]

Dictionary learning (e.g., K-SVD) [Aharon and Elad TSP 06] →

Online dictionary learning [Mairal et al. ICML/JMLR 09]

AdaBoosting → Online boosting [Grabner and Bischof CVPR 06]

Multiple instance boosting → Online multiple instance boosting

[Babenko et al. CVPR 09]

81- EX 7. Fixed ⇔ Adaptive / Dynamic

[Elad and Aharon TIP 06]

Ideas

Adaptive approaches usually outperform the predefined/fixed

ones.

82 - EX 8. Parametric ⇔ Non-parametric

Probability density estimation

Parametric

Assumes a specific functional form with paramter θ

e.g., Gaussian distribution with unknown mean and variance, mixture

of Gaussians

Parameter estimation

Estimative approach: p(x) = p(x|θbest)

Bayesian approach p(x) =

a(θ)p(x|θ)dθ

Non-parametric

Do not assume a specific form of the probability distributions

e.g., Histogram, kernel density estimation (or Parzen window method)

83 - EX 8. Parametric ⇔ Non-parametric

Probability density estimation

Parametric

Assumes a specific functional form with paramter θ

e.g., Gaussian distribution with unknown mean and variance, mixture

of Gaussians

Parameter estimation

Estimative approach: p(x) = p(x|θbest)

Bayesian approach p(x) =

a(θ)p(x|θ)dθ

Non-parametric

Do not assume a specific form of the probability distributions

e.g., Histogram, kernel density estimation (or Parzen window method)

83 - EX 9. Z - invariant

Make your method robust to potential performance degradation

noise (e.g., Gaussian additive noise, impluse noise, non-uniform

noise) (e.g., image restoration)

translation shift (e.g., near-duplicate image/video detection, image

search)

scale change (e.g., object detection, feature extraction)

perspective distortion (e.g., feature extraction)

deformation (e.g., non-rigid registration, part-based object

detection)

pose variation (e.g., human pose estimation)

lighting variation (e.g., face recognition)

partial occlusion (e.g., object detection and recognition)

84 - EX 9. Z - invariant

Make your method robust to potential performance degradation

noise (e.g., Gaussian additive noise, impluse noise, non-uniform

noise) (e.g., image restoration)

translation shift (e.g., near-duplicate image/video detection, image

search)

scale change (e.g., object detection, feature extraction)

perspective distortion (e.g., feature extraction)

deformation (e.g., non-rigid registration, part-based object

detection)

pose variation (e.g., human pose estimation)

lighting variation (e.g., face recognition)

partial occlusion (e.g., object detection and recognition)

84 - EX 9. Z - invariant

Make your method robust to potential performance degradation

noise (e.g., Gaussian additive noise, impluse noise, non-uniform

noise) (e.g., image restoration)

translation shift (e.g., near-duplicate image/video detection, image

search)

scale change (e.g., object detection, feature extraction)

perspective distortion (e.g., feature extraction)

deformation (e.g., non-rigid registration, part-based object

detection)

pose variation (e.g., human pose estimation)

lighting variation (e.g., face recognition)

partial occlusion (e.g., object detection and recognition)

84

Make your method robust to potential performance degradation

noise (e.g., Gaussian additive noise, impluse noise, non-uniform

noise) (e.g., image restoration)

translation shift (e.g., near-duplicate image/video detection, image

search)

scale change (e.g., object detection, feature extraction)

perspective distortion (e.g., feature extraction)

deformation (e.g., non-rigid registration, part-based object

detection)

pose variation (e.g., human pose estimation)

lighting variation (e.g., face recognition)

partial occlusion (e.g., object detection and recognition)

84

Make your method robust to potential performance degradation

noise (e.g., Gaussian additive noise, impluse noise, non-uniform

noise) (e.g., image restoration)

translation shift (e.g., near-duplicate image/video detection, image

search)

scale change (e.g., object detection, feature extraction)

perspective distortion (e.g., feature extraction)

deformation (e.g., non-rigid registration, part-based object

detection)

pose variation (e.g., human pose estimation)

lighting variation (e.g., face recognition)

partial occlusion (e.g., object detection and recognition)

84

Make your method robust to potential performance degradation

noise (e.g., Gaussian additive noise, impluse noise, non-uniform

noise) (e.g., image restoration)

translation shift (e.g., near-duplicate image/video detection, image

search)

scale change (e.g., object detection, feature extraction)

perspective distortion (e.g., feature extraction)

deformation (e.g., non-rigid registration, part-based object

detection)

pose variation (e.g., human pose estimation)

lighting variation (e.g., face recognition)

partial occlusion (e.g., object detection and recognition)

84

Make your method robust to potential performance degradation

noise (e.g., Gaussian additive noise, impluse noise, non-uniform

noise) (e.g., image restoration)

translation shift (e.g., near-duplicate image/video detection, image

search)

scale change (e.g., object detection, feature extraction)

perspective distortion (e.g., feature extraction)

deformation (e.g., non-rigid registration, part-based object

detection)

pose variation (e.g., human pose estimation)

lighting variation (e.g., face recognition)

partial occlusion (e.g., object detection and recognition)

84

Make your method robust to potential performance degradation

noise (e.g., Gaussian additive noise, impluse noise, non-uniform

noise) (e.g., image restoration)

translation shift (e.g., near-duplicate image/video detection, image

search)

scale change (e.g., object detection, feature extraction)

perspective distortion (e.g., feature extraction)

deformation (e.g., non-rigid registration, part-based object

detection)

pose variation (e.g., human pose estimation)

lighting variation (e.g., face recognition)

partial occlusion (e.g., object detection and recognition)

84- EX 10. Z - aware

[Wang et al. SIGGRAPH Asia 09] [Wang et al. SIGGRAPH 10]

motion-aware video resizing

Make your method be aware of potential failure cases

Motion (e.g., video processing)

Content (e.g., image processing)

Semantic (e.g., image and video indexing/retrival)

Context (e.g., image understanding)

Occlusion (e.g., detection/tracking)

85 - EX 10. Z - aware

[Wang et al. SIGGRAPH Asia 09] [Wang et al. SIGGRAPH 10]

motion-aware video resizing

Make your method be aware of potential failure cases

Motion (e.g., video processing)

Content (e.g., image processing)

Semantic (e.g., image and video indexing/retrival)

Context (e.g., image understanding)

Occlusion (e.g., detection/tracking)

85 - EX 10. Z - aware

[Wang et al. SIGGRAPH Asia 09] [Wang et al. SIGGRAPH 10]

motion-aware video resizing

Make your method be aware of potential failure cases

Motion (e.g., video processing)

Content (e.g., image processing)

Semantic (e.g., image and video indexing/retrival)

Context (e.g., image understanding)

Occlusion (e.g., detection/tracking)

85

[Wang et al. SIGGRAPH Asia 09] [Wang et al. SIGGRAPH 10]

motion-aware video resizing

Make your method be aware of potential failure cases

Motion (e.g., video processing)

Content (e.g., image processing)

Semantic (e.g., image and video indexing/retrival)

Context (e.g., image understanding)

Occlusion (e.g., detection/tracking)

85

[Wang et al. SIGGRAPH Asia 09] [Wang et al. SIGGRAPH 10]

motion-aware video resizing

Make your method be aware of potential failure cases

Motion (e.g., video processing)

Content (e.g., image processing)

Semantic (e.g., image and video indexing/retrival)

Context (e.g., image understanding)

Occlusion (e.g., detection/tracking)

85- Outline

1

Introduction

2

Five ways to come up with new ideas

Seek different dimensions

neXt = Xd

Combine two or more topics

neXt = X + Y

Re-think the research directions

neXt = ¯

X

Use powerful tools, find suitable problems

neXt = X ↑

Add an appropriate adjective

neXt = Adj + X

3

What is a bad idea?

86 - What is a bad idea?

Naive combination of two or more methods

Avoid a pipeline system paper

Blind application of tools

Use X feature and Y classifier without motivation and justification

Follow the hype

Too many competitors

Do just because it can be done

Do the right things, not just do things right

87 - 88
- 89
- 90
- 91
- 92
- 93
- Thank you for your kind attention.

Questions?

For more complete materials, please visit my blog

http://jbhuang0604.blogspot.com/

94