このページは http://www.slideshare.net/ren4yu/image-retrieval-with-fisher-vectors-of-binary-features-miru14 の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

約2年前 (2014/08/02)にアップロードinテクノロジー

Recently, the Fisher vector representation of local features has attracted much attention because...

Recently, the Fisher vector representation of local features has attracted much attention because of its effectiveness in both image classification and image retrieval. Another trend in the area of image retrieval is the use of binary feature such as ORB, FREAK, and BRISK. Considering the significant performance improvement in terms of accuracy in both image classification and retrieval by the Fisher vector of continuous feature descriptors, if the Fisher vector were also to be applied to binary features, we would receive the same benefits in binary feature based image retrieval and classification. In this paper, we derive the closed-form approximation of the Fisher vector of binary features which are modeled by the Bernoulli mixture model. In experiments, it is shown that the Fisher vector representation improves the accuracy of image retrieval by 25% compared with a bag of binary words approach.

- Image Retrieval with Fisher Vectors of

Binary Features

KDDI R&D Laboratories, Inc.

Yusuke Uchida, Shigeyuki Sakazawa - Image retrieval using local features

• Local Invariant Feature:

– Robust against occlusion, illumination change, viewpoint

change, and so on

• Applications:

– Product search (Amazon Flow), landmark recognition (Google

Goggles), augmented reality (Qualcomm Vuforia), …

2014/8/1

2 - Trends in image retrieval using local features

• 1999: SIFT [Lowe,ICCV’99]

• 2003: SIFT + Bag-of-visual words [Sivic+,ICCV’03]

• 2007: SIFT + Fisher vector [Perronnin+,CVPR’07,ECCV’10]

– New effective image representation

• 2011: Local binary features (ORB [Rublee+,ICCV’11], FREAK, BRISK)

– Efficient alternatives to SIFT or SURF

• In this presentation：

– Propose Fisher vector of binary features for image retrieval

– Model binary features by Bernoulli mixture model (BMM)

– Derive closed-form approximation of Fisher vector of BMM

– New normalization method is applied to Fisher vector

2014/8/1

3 - Pipeline of image retrieval using local features

－－・

－－・

・

Region detection

・

－－・

－

・

・－

・

・－

Feature description

－－・

－－・

・・－

・・

－

Aggregation

A set of feature vector X

－－・

A single vector representation ・

of the image

・－

Classifier (e.g. SVM)

Similarity search

2014/8/1

4 - Position of this research

Aggregation methods

Bag-of-visual words

Fisher vector

pe

Accurate

Continuous

[1]

[2, 3]

(SIFT, SURF)

st

Fa

Binary

[4]

This research

scriptor ty (ORB, FREAK, BRISK)

De

[1] J. Sivic and A. Zisserman, "Video google: A text retrieval approach to object

matching in videos," in Proc. of ICCV’03.

[2] F. Perronnin and C. Dance, "Fisher kernels on visual vocabularies for image

categorization," in Proc. of CVPR’07.

[3] F. Perronnin, et al., "Improving the fisher kernel for large-scale image

classification,” in Proc. of ECCV’10.

[4] D. Galvez-Lopez and J. D. Tardos, "Real-time loop detection with bags of binary

words," in Proc. of IROS’11.

2014/8/1

5 - Fisher kernel [Jaakkola+, NIPS’98]

• The generation process of X is modeled by

a probability density function p(X|λ) with a parameter set λ

• Describe X by the gradient of the log-likelihood function L(X|λ) =

log P(X|λ) (=Fisher score)

• Similarity between X and X’ is defined by the Fisher kernel K(X,X’):

Fisher information matrix F = [

E ∇ L( x | λ) ∇ L( x | )T

λ ]

λ

λ

λ

K ( X , X ' ) = ∇ L( X | λ)T

−1

F

∇ L(X '| λ)

λ

λ

λ

Fisher score (gradient of log-likelihood function)

[5] T. Jaakkola and D. Haussler, "Exploiting generative models in discriminative

classifiers," in Proc. of NIPS'98.

2014/8/1

7 - Fisher vector [Perronnin+, CVPR’07]

• Explicit feature mapping for Fisher kernel

T

−1

K X X

= ∇

X λ

F

∇

X λ

( ,

' )

L(

| )

L( '| )

λ

λ

λ

– As the Fisher information matrix (FIM) F is positive

semidefinite and symmetric, it has a Cholesky decomposition:

−1

λ

F

T

= λ

L

λ

L

– Thus Fisher kernel can be rewritten as a dot-product between

Fisher vectors zX and zX’:

T

K ( X , X ' ) = z z

X

X '

where z = L ∇ L( X | λ)

X

λ

λ

Decomposed FIM Fisher score

2014/8/1

8 - Fisher vector of GMM [Perronnin+, CVPR’07]

• SIFT features are modeled by Gaussian mixture model (GMM)

T

N

p(X | λ) = ∏ p(x |λ ,) p(x |λ) = w N(x ;µ , )

t

t

∑

Σ

i

t

i

i

t 1

=

i=1

Closed-form approximation of the Fisher vector of GMM

under the fol owing assumptions:

1. The Fisher information matrix F is diagonal

2. The number of features extracted from an image is

constant and equal to T

F

3. The posterior probability r(i) is peaky

• Compared with bag-of-visual words,

Fisher vector contains

higher order information

2014/8/1

9 - Local binary features [Rublee+, ICCV’11]

• Local binary features: ORB, BRISK, FREAK, and many others

– One or two magnitudes faster than SIFT or SURF

– Multi-scale FAST detector or its variants

– Binary descriptor based on binary tests on pixel’s luminance

ion

reg

• Binary tests are defined by pairs of positions

• If the luminance of the first position is brighter

ture

ea

than the luminance of second position;

cal f

then the test generate bit ‘1’

Lo

A p art of binary tests of ORB

• Resulting in a binary vector (0, 0, 1, 0, 1, …, 1)

256

2014/8/1

10 - Modeling binary features by BMM

• Model binary features by Bernoulli mixture model (BMM)

Notations

X: A set of T binary feature X = (x1, …, xT) with D dimension (D bits)

λ: a set of parameters λ = {w , µ ,i = ..

1 N , d = 1.. }

D

i

id

T

p( X | λ) = ∏ p(x | λ)

Naïve Bayes assumption

t

t =1

N

p(x | λ) =

w p (x | λ)

Each feature is generated from

t

∑ i i t

one of the N components

i=1

D

p (x | λ) =

td

1

td

µ 1

(

µ

) Single multivariate Bernoulli distribution

i

t

∏ x

−

−

x

id

id

d =1

2014/8/1

11 - Visualizing clustering results of BMM

• The parameters λ are estimated by EM algorithm (for N =32)

using 1M training ORB features

binary tests with top 5 high probability of generating bit “0”

binary tests with top 5 high probability of generating bit “1”

All binar

y tests

define d in ORB

Four components (clusters) out of N = 32 components

• Mixture model successfully captures underlying bit correlation

2014/8/1

12 - Fisher vector of BMM

• Definition of Fisher vector:

L( X | λ ) = ∑ L(x | λ)

t

t

z = L ∇ L( X | λ )

x λ =

p x λ

X

λ

λ

L(

| )

log (

| )

t

t

N

Decomposed FIMFis

her score

p(x | λ) =

w p (x | λ)

t

∑

i

i

t

i=1

D

• Fisher score w.r.t. μ

x

−

1 x

id

p (x | λ) =

td

td

µ 1

(

µ

)

i

t

∏

−

id

id

d =1

−

L

∂ (x | λ)

(− )

1 1 xtd

t

= γ (i)

t

x

−

td

∂µ

µ 1

(

1 xtd

− µ

)

id

id

id

(

| λ)

γ i() = p i( | x ,λ) = w p x

i

i

t

t

t

Posterior probability

∑Nw p (x |λ)

j

j

t

j =1

2014/8/1

13 - Fisher vector of BMM

• Fisher information w.r.t. μid (fμid)

= 0

Fisher score

Posterior probability

is peaky

2014/8/1

14 - Posterior probability 𝑝(𝑖|𝑥𝑡, 𝜆)

• Histogram of max𝑖𝑝 𝑖 𝑥𝑡, 𝜆 (𝑁 = 256)

1

0.8

0.6

0.4

0.2

0

0 0.2 0.4 0.6 0.8 1

• Peaky!

2014/8/1

15 - Vector normalization

• Normalization is essential part of Fisher vector representation [3]

• Power normalization [3]

– 𝑧𝑧𝑖𝑖 = sgn 𝑧𝑖𝑖 |𝑧𝑖𝑖|𝛼 (𝛼 = 0.5)

• L2 normalization [3]

Originally proposed for FV

– 𝑧𝑧

2

𝑖𝑖 = 𝑧𝑖𝑖/ ∑𝑖𝑖 𝑧𝑖𝑖

• Intra normalization [6] (originally proposed for VLAD not for FV)

– perform L2 normalization within each BMM component

– 𝑧𝑧

2

𝑖𝑖 = 𝑧𝑖𝑖/ ∑𝑖 𝑧𝑖𝑖

[3] F. Perr onnin, et al., "Improving the fisher kernel for large-scale image

classification,” in Proc. of ECCV’10.

[6] R. Arandjelovic and A. Zisserman, "All about VLAD," in Proc. of CVPR'13.

2014/8/1

16 - Experimental setup

•

Dataset: Stanford Mobile Visual Search

– http://www.stanford.edu/~dmchen/mvs.html

– CD class is used for evaluation

•

Performance measure: mean average precision (MAP)

•

Binary feature: ORB (OpenCV implementation, 4 scales, 900 features/image)

100 Reference image

400 Query images

2014/8/1

17 - Experimental results (1)

• Compare the proposed Fisher vector with BoVW (N=1024)

• Evaluate normalization methods (P=Power, In=Intra normalization)

In Norm FV

better

Imp. FV

BoVW

Pure FV

Number of mixture components

• Fisher vector without any normalization achieves poor results

• Power and/or L2 normalization significantly improves FV

• Intra normalization outperforms the others in al N!

2014/8/1

18 - Experimental results (2)

• Add independent images to database as a distractor

=Proposed FV

• The Fisher vector achieves better performance in all database sizes

• The degradation of the Fisher vector is relatively small

2014/8/1

19 - Summary

• Proposed Fisher vector of binary features for image retrieval

– Model binary feature by Bernoulli mixture model (BMM)

– Derive closed-form approximation of Fisher vector of BMM

– Apply new normalization method to Fisher vector

• Future work

– Encode Fisher vector into a compact code for efficiency

(The method proposed in [7] seems promising)

– Apply proposed Fisher vector to other binary features

(e.g., audio fingerprints)

[7] Y. Gong et al., "Learning Binary Codes for High-Dimensional Data Using Bilinear

Projections," in Proc. of CVPR'13.

2014/8/1

20 - 2014/8/1

21 - Fisher vector of BMM (Fisher score)

z = L ∇ L( X | λ )

X

λ

λ

Decomposed FIMFis

her score L(x | λ) = log p(x | λ)

t

t

N

• Fisher score w.r.t. μ

p(x | λ) =

w p (x | λ)

t

∑

id

∂

i

i

t

i=1

p( x | λ )

−

L

∂ (x | λ)

t

∂µ

(− )

1 1 xtd

t

id

=

= γ (i)

−

∂µ

p( x | λ )

t

xtd

µ 1

(

1 xtd

− µ

)

id

t

id

id

∂

D

p ( x | λ)

i

t

−

= (− 1 xtd

)

1

∏ x

−

te

µ 1

( − 1 xte

µ

)

∂

ie

ie

µid

e= ,

1 e≠d

Occupancy probability

(

| λ)

γ i() = p i( | x ,λ) = w p x

i

i

t

(posterior probability) t

t

∑Nw p (x |λ)

j

j

t

2014/8/1

j =1

22