このページは http://www.slideshare.net/kohta/model-transport-towards-scalable-transfer-learning-on-manifolds の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

約2年前 (2014/08/01)にアップロードinテクノロジー

- Outline

• 問題設定

• Transfer Learning

• 多様体、接空間、Parallel Transport

• TransporDng Methods

• Results

• 所感 - 問題設定

• 問題

– あるドメイン

Model Transport: Towards Scalable T 1では沢山データがあるが、そこから移動した

ransfer Learning on Manifolds

ドメイン2ではデータが少ない

– 何とかドメイン

Oren Freifeld

Søren Hauberg

2のよいモデルを作れないか？

Michael J. Black

MIT

• 条件

DTU Compute

MPI for Intelligent Systems

Cambridge, MA, USA

Lyngby, Denmark

T¨ubingen, Germany

freifeld@csail.mit.edu

– ドメイン間の移動の仕方は分かっ

sohau@dtu.dk

て

black@tue.mpg.de いる

– データそのものを移動するのはexpensive

Abstract

We consider the intersection of two research fields:

transfer learning and statistics on manifolds. In particu-

lar, we consider, for manifold-valued data, transfer learn-

ing of tangent-space models such as Gaussians distribu-

tions, PCA, regression, or classifiers. Though one would

hope to simply use ordinary Rn-transfer learning ideas, the

manifold structure prevents it. We overcome this by basing

our method on inner-product-preserving parallel transport,

(a) Data on a manifold

(b) Data models

a well-known tool widely used in other problems of statis-

tics on manifolds in computer vision. At first, this straight-

forward idea seems to suffer from an obvious shortcom-

ing: Transporting large datasets is prohibitively expensive,

hindering scalability. Fortunately, with our approach, we

never transport data. Rather, we show how the statistical

models themselves can be transported, and prove that for

the tangent-space models above, the transport “commutes”

with learning. Consequently, our compact framework, ap-

(c) Ordinary translation

(d) Model transport

plicable to a large class of manifolds, is not restricted by

the size of either the training or test sets. We demonstrate

Figure 1: Model Transport for covariance estimation. On

the approach by transferring PCA and logistic-regression

nonlinear manifolds, statistics of one class (red) are trans-

models of real-world data involving 3D shapes and image

ported to improve a statistical model of another class (blue).

descriptors.

While ordinary translation is undefined (c), and data trans-

port is expensive, a model can be inexpensively transported

(green) while preserving data statistics (d).

1. Introduction

In computer vision, manifold-valued data arise often.

The advantages of representing such data explicitly on a

eralizes those Rn-TL tasks where models learned in one re-

manifold include a compact encoding of constraints, dis-

gion of Rn are utilized in another. Note, however, that we

tance measures that are usually superior to ones from Rn,

do not claim that all Rn-TL tasks have this form.

and consistency. For such data, statistical modeling on the

Let M denote an n-dimensional manifold and let TpM

manifold is generally better than statistical modeling in a

and TqM denote two tangent spaces to M, at points p, q 2

Euclidean space [12, 18, 29, 39]. Here we consider the first

M . One cannot simply apply models learned in TpM to

scalable generalization, from Rn to Riemannian manifolds,

data in TqM as these, despite both being isomorphic to Rn,

of certain types of transfer learning (TL). In particular, we

are two different spaces: A model on TpM is usually not

consider TL in the context of several popular tangent-space

even defined in TqM; see Fig. 1c. Such obstacles, caused by

models such as Gaussian distributions (Fig. 1b), PCA, clas-

the curvature of M, do not arise in Rn-TL. To address this

sifiers, and simple linear regression. In so doing, we recast

we could parallel transport (PT) [7] the data from TpM to

TL on manifolds as TL between tangent spaces. This gen-

TqM , learn a model of the transported data in TqM , and use - Transfer Learning

• あるドメインで学習したモデルを別のドメインでも使

いたい

– その為の「修正」を改めて学習する

– 色々な問題設定、手法が存在する… - 多様体、接空間、Parallel Transport

• 多様体

– 多様体は「曲面」を一般化したようなもの - 多様体、接空間、Parallel Transport

• 多様体

– 多様体は「曲面」を一般化したようなもの

– ユークリッド空間に埋め込まれた多様体（球面とか）上の

点はベクトルで表せるが、足したり引いたりすることはで

きない

ベクトルの比較も

多様体上の点ではない！

簡単にはできない！ - 多様体、接空間、Parallel Transport

• 意味のあるデータは多様体上にある

M. Alex O. Vasilescu, h1p://alumni.media.mit.edu/~maov/research_index.html - 多様体、接空間、Parallel Transport

• 意味のあるデータは多様体上にある

Original Sequence

Parallel Transport

on the Manifold

Parallel Transport

on the Euclidean

Space

Fig. 7.

Comparison between the results of parallel transport on the manifold versus that of Euclidean space. The ﬁrst sequence (its tangent vector from

the leftmost to the rightmost shape) is parallel transported to the face on the second and third row, and the new sequence is synthesized on the manifold

(second row) and Euclidean space (third row).

S. Taheri, et al.: Towards View- ‐Invariant Expression Analysis Using AnalyGc Shape Manifolds

corresponding to each AU [29]. Especially for the CK

sufﬁcient landmarks on the faces limits our recognition

database, since each AU is represented using a sequence of

capabilities. For example we cannot recognize the AU-43

projection matrices and since expressions occur at different

since no landmarks are provided for the eyes. Also for the

rates, it is necessary to time warp the sequences in order to

CK database, since the sequences are mainly corresponding

learn a rate-invariant model for them. Adapting the DTW

to the expressions and not AUs, we only chose those AUs

algorithm to the sequences that reside on a Riemannian

for which enough training sequences are available. We divide

manifold is a straightforward task, since DTW can operate

both databases into eight sections, each of which contains

with any measure of similarity between the different tem-

images from different subjects. Each time, we use seven

poral features. Here, we use the geodesic distance between

sections for training and the remaining sections for testing

the projection matrices of different sequences as a distance

so that the training and testing sets are mutually exclusive.

function and warp all the sequences (corresponding to an

The average recognition performance is computed on all the

AU) to a randomly selected sequence. Then the ﬁnal model

sections.

for each AU is obtained by computing the Karcher mean of

For the Bosphorus database, we perform maximum like-

all the warped sequences. This is a simple and fast approach

lihood (ML) recognition where we ﬁnd the probability of

that works fairly well.

each test velocity vector comes from the learned Gaussian

1) Action Units Recognition: Using the learned AU mod-

distribution. But for the CK database, we ﬁrst warp each

els for the Bosphorus and CK databases, we perform AU

test sequence to the learned template using DTW and then

recognition. We report the results on seventeen single AUs

use the distance between the two sequences for recognition.

in the Bosphorus database and nine single or combined AUs

Figure 8 shows the confusion matrices for both databases. As

in the CK database. The training samples are chosen as

the results indicate, for the AUs that are mainly identiﬁed

images/sequences containing only the target AU occurring

by their facial deformations the recognition rate is high,

in the corresponding local facial components (brow, eye,

e.g. AU-2, AU-4, and AU-27. However, for AUs whose

nose, and mouth). In the Bosphorus database the lack of

distinction is more due to the appearance deformations than

the geometries, the algorithm may confuse them with the

AUs with similar geometries, e.g. AU-16 and AU-25. In these

cases, AUs occurring in other parts of the face can be used as

cues to remove the ambiguity and improve the recognition.

We also performed a recognition experiment using the

Bosphorus database on the Euclidean space, where the

normal parallel transport is performed before learning the

distributions. While the average recognition rate for AUs on

the Grassmann manifold is 83%, this value is 79% on the

Euclidean space. Although the recognition rate is improved

on the Grassmannian, but it is not considerable. A possible

Fig. 6.

A sequence of facial expression is a curve on the Grassmann

reason can be the fact that in the Bosphorus database the

manifold.

faces are almost always frontal and there are not signiﬁcant - 多様体、接空間、Parallel Transport

• 多様体

– より正確には、各点で局所座標系（ユークリッド空間への

同相写像）が定義できるような位相空間のこと

– 「近傍」があるような集合と思えば（ビジョン等への応用上

は）たぶんよい - Exponential Map

多様体、接空間、Parallel Transport

• 接空間

Given a Li e group G, with its related Lie Algebra g = TG(I), there always exists a

–

sm o ある

oth 1点で

map 多様体に

from Lie A 「接す

lgebra る

g 」平坦な

to the 空間（超平面）を

Lie group G called 考え

exponential map

る事ができる。これを接空間と呼ぶ

– 接空間はベクトル空間なので我々がよく知っている様々

な操作を行うことができる

– 実用上は接空間上でのアルゴリズムを考えるのが普通

is the point in that can be

：多様体上の点を（単位元Iの周りの）

接空間T

log(a)

IM

reached by traveling along the

接空間にマッピング

I

（exponenDal map

geo

の逆写像） desic passing through the

so(n)

SO(n)

多様体M

a

A

identity in direction ,

for a unit of time

：（単位元Iの周りの）接空間上の点を

多様体にマッピング

（接空間上の直線が多様体上の測地線

hIp://www.inf.ethz.ch/personal/lballan/teaching.html

になるようなマッピング (）N

ote: A defines also the

traveling speed) - Exponential Map

多様体、接空間、Parallel Transport

• 接空間

Given a Li e group G, with its related Lie Algebra g = TG(I), there always exists a

–

sm o 多様体と

oth map 接空間の間を

from Lie Alge 行き

bra g 来す

to t る

he 写像を

Lie gr 定義で

oup G caき

ll る

ed こと

exponential map

がある

– ExponenDal map: 接空間 - ‐> 多様体

– Logarithm map: 多様体 - ‐> 接空間

is the point in that can be

log(a) ：多様体上の点を（単位元Ireach

の周り ed

の） by traveling along the

接空間にマッピング

I

（exponenDal map

ge

の逆写像） odesic passing through the

so(n)

接空間

SO(n)

多様体M

a

A

TIM

identity in direction ,

for a unit of time

：（単位元Iの周りの）接空間上の点を

多様体にマッピング

（接空間上の直線が多様体上の測地線

hIp://www.inf.ethz.ch/personal/lballan/teaching.html

になるようなマッピング (）N

ote: A defines also the

traveling speed) - Exponential Map

多様体、接空間、Parallel Transport

• 接空間

Given a Li e group G, with its related Lie Algebra g = TG(I), there always exists a

–

sm o eoxtp

h m

ap

m 等は

ap fro 、

m 測地線と

Lie Alge の関係で

bra g to t 定義さ

he Lie grれる

oup G called exponential map

– 接空間上の直線が多様体上の測地線にマッピングされる

ようなものとして定義

– 多様体の距離構造（計量）に依存して決まる

is the point in that can be

log(a) ：多様体上の点を（単位元Ireach

の周り ed

の） by traveling along the

接空間にマッピング

I

（exponenDal map

ge

の逆写像） odesic passing through the

so(n)

接空間

SO(n)

多様体M

a

A

TIM

identity in direction ,

for a unit of time

：（単位元Iの周りの）接空間上の点を

多様体にマッピング

（接空間上の直線が多様体上の測地線

hIp://www.inf.ethz.ch/personal/lballan/teaching.html

になるようなマッピング (）N

ote: A defines also the

traveling speed) - 多様体、接空間、Parallel Transport

• Lie群

– Lie群は、「群でありかつ多様体である」ような集合

– 群演算が多様体上の写像になっていて、接空間（Lie代

数）やexpmap等が明示的に構成できることが多いので実

際上便利

– ビジョンの応用で現れる多様体は（サーフェスを除いて）

Lie群であることが多い

群の定義

集合 G

とその上の演算 :

G

⇥

G

!

G

について

g, h, k 2 G

g (h k) = (g h) k

（結合率）

e 2 G, g e = e g = g

（単位元の存在）

g 1

（逆元の存在）

2 G, g g 1 = g 1 g = e - 多様体、接空間、Parallel Transport

• 行列とLie群

– 実用上現れる多くのLie群は行列型で、行列積について群

になっている（一般線形群の部分群）

– このような群に対しては行列の指数関数が自然な

exponenDal mapになっている

1

X 1

exp(X) =

Xn

n!

n=0 - 多様体、接空間、Parallel Transport

• Lie群

– Lie群の例: SO(3) (3次元回転群)

SO(3) = {R | R 2 R3⇥3, RRT = I, |R| = 1}

– so(3): SO(3)のLie代数（単位元Iの周りでの接空間）

0

1

0

!1

!2

⌦ = @ !

A

1

0

!3

, !1, !2, !3 2 R

!2

!3

0

sin |⌦|

1

cos |⌦|

exp(⌦) = I +

⌦ +

⌦2

|⌦|

|⌦|2

✓

◆

1

Tr(R)

1

log R =

(R

RT ),

✓ = cos 1

2 sin ✓

2 - 多様体、接空間、Parallel Transport

170

• Parallel Transport

– 接ベクトル（接空間の元）を

多様体に沿って移動させる

– 接続(connecDon)と呼ばれ

1

x0 2 TpM

x

る量によって定義される

1 2 TqM

q

0.5

• 色々な接続を考えることがで

0

曲線c

きる。通常は接空間のベクト

p

−0.5

ルの内積を保存する接続

−1

（Levi- ‐Civita接続）を用いる

1

0.5

1

–

0

接ベクトルの移動

0.5

−0.5

0

−0.5

−1

−1

＝ 接空間モデルの移動

Figure 5.3: Parallel transport of x0 along the geodesic between p and q.

c

st : Tc(s)M ! Tc(t)M

1.

c(0)!c(0)(x0) = x0.

曲線c上の点c(s)でのベクトルから

点c(t)のベクトルへの写像

2. Let u, s and t be in [0, 1]. If c(u)!c(t) and c(s)!c(u) are defined in a similar way to

c(0)!c(t), then

c(u)!c(t)

c(s)!c(u) =

c(s)!c(t).

3. If x0 is fixed and t varies, then c(0)!c(t)(x0) : [0, 1] ! Rn is a smooth function of t.

4. If x and y are in TpM , then, for every t, their inner-product is preserved:

¨

∂

hx, yic(0) =

c(0)!c(t)(x), c(0)!c(t)(y)

.

(5.18)

c(t)

Consequently, norms of tangent vectors and angles between tangent vectors are pre-

served; see Fig. 5.4.

5.

c(0)!c(t) : Tc(0)M ! Tc(t)M is a bijective linear map.

5.4.2

The General Case

We now proceed to the more general case. Let M be a geodesically-complete Riemannian

manifold. To see how a covariance can be moved across M , consider first tangent vectors.

While simple translation will not do, we can transport vectors from one tangent space to

another using parallel transport, to be defined as follows5.

5Another way to define it is trough the notion of a connection - a term we avoid elaborating on. See [27]. - Model Transport

• Karcher mean (Frechet mean)

SO(2) and SO(3): Tangent Spaces

X

What are the tangent spaces of these tw

Xf o ma

⌘ ar nif

gm ol

in ds? d(X, Xi)

X

i

SO(2)

SO(3)

多様体

3-manifold

1-manifold

効率的な計算法: X. Pennec, ProbabiliDes and staDsDcs on Riemannian manifolds:

Basic tools for geometric measurementss

vector space with

vector space with

3 dimensions

1 dimension

subspace of

subspace of

They are matrices - Model Transport

• Parallel Transportの数値計算

– Schild’s Ladder

174

Figure 5.6: An illustration of Schild’s Ladder for approximating the parallel transport (for

K = 3). See text for details.

5.4.4.2

Schild’s ladder

M. Lorenzi, et al.: Schild’s ladder for the parallel transport of deformaDons in Dme series

of images IPMI 2011

When there is no closed-form solution, we use Schild’s ladder [106], a strikingly simple

numerical technique to compute an arbitrarily accurate approximation6 of the LC parallel

transport using only the Exp/Log maps7. We are not the first to use Schild’s ladder in

computer vision applications (although we are the first to use it in the context of transfer

learning). For example, the technique has been recently used in modeling longitudinal

medical data [96, 115] and in tracking [66].

Schild’s ladder is a numerical scheme that enables computation of the LC parallel

transport [106]. We wish to parallel transport a tangent vector v0 from x0 to xK on

M along the geodesic curve ↵ that joins them. Schild’s ladder places points along ↵

and approximately parallel transports v0 to these by forming generalized parallelograms8

on M (see Fig. 5.6): Let {x1, . . . , xK 1} denote points along ↵. Start by computing

a0 = Expx (v

0

0) and the midpoint b1 of the geodesic segment joining x1 and a0. Follow the

geodesic from x0 through b1 for twice its length to the point a1. This scheme is repeated

for all sampled points along the geodesic from x0 to xK. The final parallel transport of v0

6This is not true for every parallel transport – but it does hold at least for parallel transport associated

with a symmetric connection [82] such as LC, the only connection which is both metric and symmetric.

7Note that if Exp/Log maps are unavailable analytically, then Exp maps can be computed by integrating

an initial value problem and geodesics/Log maps can be computed by solving a boundary value problem

[111].

8This generalization is known as the Levi-Civita parallelogramoid. - TransporDng Methods

• Task1

– ドメインLのデータ

{x

L が十分にあり、転移先のドメイ

i }NL

i=1

ンSのデータ

{

xS が少数しかない（平均は計算可能）

i }NS

i=1

– に対応する分散共分散行列を求めたい

{xS

i }

• Task2

– ラベル（離散または連続）データ ,

{xA

yA

i }NA

i=1 i

= label(xA

i )

とラベルなしデータ

{xB

がある

i }NB

i=1

–

{xB のラベルを求めたい（識別、回帰）

i }

分散共分散行列

1

X

Cov({pi}N

i=1) =

log

N

1

µ(pi) logµ(pi)T

i

P. T. Fletcher, et al.: Principal Geodesic Analysis for the Study of Nonlinear StaDsDcs of Shape

Trans. Med. Imag. (2004) - TransporDng Methods

• Data Transport

– ドメインAのデータをドメインBのまわりにtransportする

• データ数が多いとexpensive

• Basis Transport

– モデルを構成する（接空間の）基底ベクトルをドメインBの

まわりにtransportする

• モデルの次元数が多いとexpensive - TransporDng Methods

Model Transport: To • Mo

wards del Tran

Scalable s

T port(pr

ransfer opose

Lear d

ning) on Manifolds

– モデルに関わる少数のベクトルだけtransportすることが

Oren Freifeld

でき

Søren る

Hauberg

Michael J. Black

MIT

•

DTU（実際は少数に

Compute

ならな

MPIい場合も

for

多い気がす

Intelligent

る

Systems けど…）

Cambridge, MA, USA

Lyngby, Denmark

T¨ubingen, Germany

– 基本的な発想は、モデルを構成する（接）ベクトルだけを

freifeld@csail.mit.edu

sohau@dtu.dk

black@tue.mpg.de

transportする

Abstract

We consider the intersection of two research fields:

transfer learning and statistics on manifolds. In particu-

lar, we consider, for manifold-valued data, transfer learn-

ing of tangent-space models such as Gaussians distribu-

tions, PCA, regression, or classifiers. Though one would

hope to simply use ordinary Rn-transfer learning ideas, the

manifold structure prevents it. We overcome this by basing

our method on inner-product-preserving parallel transport,

(a) Data on a manifold

(b) Data models

a well-known tool widely used in other problems of statis-

tics on manifolds in computer vision. At first, this straight-

forward idea seems to suffer from an obvious shortcom-

ing: Transporting large datasets is prohibitively expensive,

hindering scalability. Fortunately, with our approach, we

never transport data. Rather, we show how the statistical

models themselves can be transported, and prove that for

the tangent-space models above, the transport “commutes”

with learning. Consequently, our compact framework, ap-

(c) Ordinary translation

(d) Model transport

plicable to a large class of manifolds, is not restricted by

the size of either the training or test sets. We demonstrate

Figure 1: Model Transport for covariance estimation. On

the approach by transferring PCA and logistic-regression

nonlinear manifolds, statistics of one class (red) are trans-

models of real-world data involving 3D shapes and image

ported to improve a statistical model of another class (blue).

descriptors.

While ordinary translation is undefined (c), and data trans-

port is expensive, a model can be inexpensively transported

(green) while preserving data statistics (d).

1. Introduction

In computer vision, manifold-valued data arise often.

The advantages of representing such data explicitly on a

eralizes those Rn-TL tasks where models learned in one re-

manifold include a compact encoding of constraints, dis-

gion of Rn are utilized in another. Note, however, that we

tance measures that are usually superior to ones from Rn,

do not claim that all Rn-TL tasks have this form.

and consistency. For such data, statistical modeling on the

Let M denote an n-dimensional manifold and let TpM

manifold is generally better than statistical modeling in a

and TqM denote two tangent spaces to M, at points p, q 2

Euclidean space [12, 18, 29, 39]. Here we consider the first

M . One cannot simply apply models learned in TpM to

scalable generalization, from Rn to Riemannian manifolds,

data in TqM as these, despite both being isomorphic to Rn,

of certain types of transfer learning (TL). In particular, we

are two different spaces: A model on TpM is usually not

consider TL in the context of several popular tangent-space

even defined in TqM; see Fig. 1c. Such obstacles, caused by

models such as Gaussian distributions (Fig. 1b), PCA, clas-

the curvature of M, do not arise in Rn-TL. To address this

sifiers, and simple linear regression. In so doing, we recast

we could parallel transport (PT) [7] the data from TpM to

TL on manifolds as TL between tangent spaces. This gen-

TqM , learn a model of the transported data in TqM , and use - TransporDng Methods

• PCA Transport

– transport前後の点 p, q 2 M

– 点pの周りの接空間のデータ {xi}N

i=1 ⇢ TpM

– 点qの周りの接空間のデータ {˜

xi}N

i=1 ⇢ Tq M

–

T

での

pM

PCA, SVDモデル

XXT = V S2V T

X = V SU T

X = [x1, . . . , xn]

V = [v1, . . . , vn] - TransporDng Methods

• PCA Transport

–

T

での

pM

PCA, SVDモデル

XXT = V S2V T

X = V SU T

X = [x1, . . . , xn]

V = [v1, . . . , vn]

– このとき、

T

での

q M

PCA, SVDモデルは以下で与えられる

˜

X ˜

XT = ˜

V S2 ˜

V T

˜

X = ˜

V SU T

˜

X = [˜

x1, . . . , ˜

xn]

˜

V = [˜

v1, . . . , ˜

vn] - TransporDng Methods

• Linear Regression Transport

– 接空間上の回帰モデル

loss funcDon

N

X

( , 0) =

argmin

li(xTi ↵ + ↵0)

↵2TpM,↵02R i=1

–

T

でのモデルは以下で与えられる

q M

= AqLA 1

p

0 =

0

Ap, Aq : 点p,qでの計量テンソル

L : pからqへのparallel transport (線形変換) - TransporDng Methods

• 実際の適用

⌃L, VL : 元のドメインでのモデル

⌃S, VS : 転移先のデータのみを使ったモデル

⌃ , V

: LでのモデルをTransportしたモデル

VF : VΓとVSを両方用いて上位k次元（他のモデルと同じ次元）だけ

とったモデル

orthogonalize([V , VS])

⌃ , V

: 縮小推定によってΓとSのモデルを合成したモデル

⌃ = ⌃ + (1

)⌃S - Results

• 人体モデル（メッシュ）1

– n = 129300次元のLie群

– 性別の異なる2つのドメイン（女性から男性にtransfer）

• 女性1000サンプル、男性50サンプル

– PCAモデル

• 女性モデル: 200次元、 男性モデル: 50次元

• 1000テストサンプル

{zi} でモデルによるメッシングと真値との誤

差を評価（測地線距離で誤差定義）

• モデルによるreconstrucDonは expµ(V V T logµ(zi)) 2 M

O. Freifeld and M. J. Black: Lie Bodies: A Manifold RepresentaDon of 3D Human Shape - Ground

Truth

Figure 3: Summary for shape experiments. Left: Gender.

VL

Right: BMI. The bars Results

represent

the overall reconstruction

(Women)

error for VL, VS, V , and VF. For a given model, the height

• 人体モ

of the

デ

bar ル（メッシ

represents ュ）

the reconstruction error measured in

– n

terms = 1

of 29300

SGE 次元の

av

Lie

eraged 群

ov er the entire test dataset as well

–

as 性別の異な

all of the

る

mesh 2つのドメ

triangles.イン（女性から男性にtransfer）

VS

(Men)

(a) VL

(b) VS

(c) V

(d) VF

(e) V

Figure

女性モデ 4:

ルを

男性サンプル

女性モデルを

Fusionモデル

分散共分散行列を

そのまま利用 Model

からモ mean

デル構成 error:

Transport Genders. Blue and red

縮小推定 in-

V

dicate small and large errors respectively. The heat maps

(PT)

are overlaid over the points of tangency associated with the

O. Freifeld and M. J. Black: Lie Bodies: A Manifold RepresentaDon of 3D Human Shape

models: p for (a), and q for (b-e). See text for details.

copies of a 6-dimensional Lie group, which is isomorphic

VF

to the product of three smaller ones, including SO(3); thus,

(Fuse)

n = 129300. While here we do not advocate a particular

manifold nor does our work focus on shape spaces, this M

enables us to easily demonstrate the MT framework. The

Figure 5: Selected results: Gender. Each column represents

data consist of aligned6 3D scans of real people [31]. On

a different test body. The heat maps are overlaid on the

this M, the LC PT is computed as follows: For the SO(3)

reconstructions using different models.

components of M, a closed-form solution is available [9],

while for the rest we use Schild’s ladder (see, e.g., [17,24]).

From Venus to Mars. We first illustrate the surpris-

ing power of MT. The training data contains NL = 1000

shapes of women (Fig. 1a, red; shown here on a 2D man-

ifold for illustration) but only NS = 50 shapes of men

(blue), where all shapes are represented as points on M.

As it is reasonable to expect some aspect of shape variation

among women may apply to men as well, we model the

(a) VL

(b) VS

(c) V

(d) VF

(e) V

shape variation of men while leveraging that of women. We

Figure 6: Model mean error: BMI. Analogous to Fig. 4.

first compute the Karcher means for women and men de-

noted p and q, respectively (Fig. 2a–2b). We then compute

their PCA models, VL ⇢ TpM and VS ⇢ TqM (kL = 200

and

1000 test male shapes, whose deformations serve as ground-

kS = 50), as well as V =

(VL). For an animated

illustration see [13]. We also compute

truth. Let

V

V 2 {VL, VS, V , VF, V }. Let µ denote the point

F and V

using

the procedures from Sec. 4.4. We evaluate performance on

of tangency; i.e., p for VS and q otherwise. Let zi 2 M

denote the true deformation of test example i. Its recon-

6MT also applies to some shape spaces that do not require alignment.

struction is Expµ(V V T Logµ(zi)) 2 M. We then com- - Ground

Truth

Figure 3: Summary for shape experiments. Left: Gender.

VL

Right: BMI. The bars represent the overall reconstruction

(Women)

error for VL, VS, V , and VF. For a given model, the height

of the bar represents the reconstruction error measured in

terms of SGE averaged over the entire test dataset as well

as all of the mesh triangles.

VS

(Men)

(a) VL

(b) VS

(c) V

(d) VF

(e) V

Figure 4: Model mean error: Genders. Blue and red in-

V

dicate small and large errors respectively. The heat maps

(PT)

are overlaid over the points of tangency associated with the

models: p for (a), and q for (b-e). See text for details.

copies of a 6-dimensional Lie group, which is isomorphic

VF

to the product of three smaller ones, including SO(3); thus,

(Fuse)

n = 129300. While here we do not advocate a particular

manifold nor does our work focus on shape spaces, this M

Results

enables us to easily demonstrate the MT framework. The

• 人体モ

Figure 5:デル（メッ

Selected シュ）

results: Gender. Each column represents

data consist of aligned6 3D scans of real people [31]. On

2

a different test body. The heat maps are overlaid on the

this M, the LC PT is computed as follows: For the SO(3)

– n = 129300次元のLie群

reconstructions using different models.

components of M, a closed-form solution is available [9],

– BMIの異なる2つのドメイン（BMI<=30からBMI>30にtransfer）

while for the rest we use Schild’s ladder (see, e.g., [17,24]).

From Venus to Mars. We first illustrate the surpris-

ing power of MT. The training data contains NL = 1000

shapes of women (Fig. 1a, red; shown here on a 2D man-

ifold for illustration) but only NS = 50 shapes of men

(blue), where all shapes are represented as points on M.

As it is reasonable to expect some aspect of shape variation

among women may apply to men as well, we model the

(a) VL

(b) VS

(c) V

(d) VF

(e) V

shape variation of men while leveraging that of women. We

BMI<=30モデル BMI>30サンプル BMI<=30モデル Fusionモデル

分散共分散行列を

Figure 6: Model mean error: BMI. Analogous to Fig. 4.

をそのまま利用 からモデル構成 をTransport

縮小推定

first compute the Karcher means for women and men de-

noted p and q, respectively (Fig. 2a–2b). We then compute

their PCA models, VL ⇢ TpM and VS ⇢ TqM (kL = 200

and

1000 test male shapes, whose deformations serve as ground-

kS = 50), as well as V =

(VL). For an animated

illustration see [13]. We also compute

truth. Let

V

V 2 {VL, VS, V , VF, V }. Let µ denote the point

F and V

using

the procedures from Sec. 4.4. We evaluate performance on

of tangency; i.e., p for VS and q otherwise. Let zi 2 M

denote the true deformation of test example i. Its recon-

6MT also applies to some shape spaces that do not require alignment.

struction is Expµ(V V T Logµ(zi)) 2 M. We then com- - Results

• 人体モデル（メッシュ）

– 各ケースの誤差値

Ground

Truth

Figure 3: Summary for shape experiments. Left: Gender.

VL

Right: BMI. The bars represent the overall reconstruction

(Women)

error for VL, VS, V , and VF. For a given model, the height

of the bar represents the reconstruction error measured in

terms of SGE averaged over the entire test dataset as well

as all of the mesh triangles.

VS

(Men)

(a) VL

(b) VS

(c) V

(d) VF

(e) V

Figure 4: Model mean error: Genders. Blue and red in-

V

dicate small and large errors respectively. The heat maps

(PT)

are overlaid over the points of tangency associated with the

models: p for (a), and q for (b-e). See text for details.

copies of a 6-dimensional Lie group, which is isomorphic

VF

to the product of three smaller ones, including SO(3); thus,

(Fuse)

n = 129300. While here we do not advocate a particular

manifold nor does our work focus on shape spaces, this M

enables us to easily demonstrate the MT framework. The

Figure 5: Selected results: Gender. Each column represents

data consist of aligned6 3D scans of real people [31]. On

a different test body. The heat maps are overlaid on the

this M, the LC PT is computed as follows: For the SO(3)

reconstructions using different models.

components of M, a closed-form solution is available [9],

while for the rest we use Schild’s ladder (see, e.g., [17,24]).

From Venus to Mars. We first illustrate the surpris-

ing power of MT. The training data contains NL = 1000

shapes of women (Fig. 1a, red; shown here on a 2D man-

ifold for illustration) but only NS = 50 shapes of men

(blue), where all shapes are represented as points on M.

As it is reasonable to expect some aspect of shape variation

among women may apply to men as well, we model the

(a) VL

(b) VS

(c) V

(d) VF

(e) V

shape variation of men while leveraging that of women. We

Figure 6: Model mean error: BMI. Analogous to Fig. 4.

first compute the Karcher means for women and men de-

noted p and q, respectively (Fig. 2a–2b). We then compute

their PCA models, VL ⇢ TpM and VS ⇢ TqM (kL = 200

and

1000 test male shapes, whose deformations serve as ground-

kS = 50), as well as V =

(VL). For an animated

illustration see [13]. We also compute

truth. Let

V

V 2 {VL, VS, V , VF, V }. Let µ denote the point

F and V

using

the procedures from Sec. 4.4. We evaluate performance on

of tangency; i.e., p for VS and q otherwise. Let zi 2 M

denote the true deformation of test example i. Its recon-

6MT also applies to some shape spaces that do not require alignment.

struction is Expµ(V V T Logµ(zi)) 2 M. We then com- - Results

• Classiﬁer Transport

– 斜めのアングルで学習した表情識別（2クラス）を正面アン

グルにTransfer

• 画像の1/4分割ごとに5次元特徴量の分散共分散行列(Symmetric

PosiDve Deﬁnite (SPD) Matrix)を計算し特徴量とする

– SPD Matrixxの集合は多様体になる（Lie群ではない）

– 両ドメイン共に168サンプル

pute, for each triangle, the Squared Geodesic Error (SGE)

between the reconstruction and the true deformation. Fix-

ing i, SGE is averaged over all body triangles, yielding the

Mean SGE (MSGE) of the ith body. Overall performance

of V is defined by averaging MSGE over all test examples.

MSGE results are summarized in Fig. 3 (left). To visualize,

we average the SGE, per triangle, over all test examples,O. Tuzel,Figure

et al.: Re7:

gio Classifier

n Covarianc -transport

e: A Fast Desc e

ri xample.

ptor for De Select

tecDon images.

and Classiﬁc T

aDop:

on

and display these per-triangle errors over the mesh associ-

First data set. Bottom: Second data set. In each row, exam-

ated with µ (Fig. 4). Figure 4a shows that VL performs very

ples from class 1 (left) and class 2 (right) are shown.

poorly; a shape model of women fails to model men. While

the errors for VS are much lower (Fig. 4b), there are still

noticeable errors due to poor generalization. The surprise

Schild’s ladder, M is not a Lie group8 and n = 60. The

is Fig. 4c, which shows the result for V : the PT dramat-

datasets {pi}NA and

reflect two different viewing

i

{qj}NB

j

ically improves the female model (Fig. 4a) to the point it

directions; NA = NB = 168. The labels of {pi}NA are

fares comparably with the male model (Fig. 4b), although

i

known, those of {q

withheld. See Fig. 7 for examples.

the only information used from the male data is the mean.

j }NB

j

We compute p and q, the means of the datasets. Then, us-

Combining transported and local models lets us do even bet-

ing {Log

ter. Figure 4d shows that

p(pi)}NA

V

i

⇢ TpM, we learn a logistic-regression

F significantly improves over VS

model. This classifier, defined on T

or

pM , is correct 59% of

V . Figure 4e shows the regularized model, V , which

the time when applied to

has the same dimensionality as

{Log

V

p(qj )}NB

j

⇢ TpM. Apply-

L and still performs well.

Figure 5 shows selected results for test bodies; see [13] for

ing the transported model to {Logq(qj)}NB

j

⇢ TqM im-

additional results and reconstructions.

proves performance to 67%. Thus, for the same unanno-

From Normal-Weight to Obesity. A good statistical

tated {qj}NB, MT improves over the baseline. Note we had

j

shape model of obese women is important for fashion and

to PT only one vector; even for such a small dataset the

health applications but is difficult to build since the data are

speed gain is already significant.

scarce as reflected by their paucity in existing body shape

datasets [31]. This experiment is similar to the previous

6. Conclusion

one, but both the data and the results are of different na-

Our work is the first to suggest a framework for gener-

ture. Here, we have 1000 shapes of women with BMI 30

alizing transfer learning (TL) to manifold-valued data. As

but only 50 shapes of women with BMI > 30. We com-

is well-known, parallel transport (PT) provides a principled

pute means and subspaces as before. Figure 2c shows q, the

way to move data across a manifold. We follow this rea-

high-BMI mean; p, the normal-BMI mean, is not shown as

soning in our TL tasks, but rather than transporting data we

it is very similar to p from the gender experiment. Figures

transport models – so the cost does not depend on the size of

3 (right) and 6 summarize the results. Compared with the

the data – and show that for many models the approaches are

gender experiment there are two main differences: 1) Here

equivalent. Thus, our framework naturally scales to large

V is already much better than VS so fusion only makes a

datasets. Our experiments show that not only is this math-

small difference. 2) Error bars (Fig. 3, right) are larger than

ematically sound and computationally inexpensive but also

before (Fig. 3, left) due to the limited amount of test data

that in practice it can be useful for modeling real data.

available for high-BMI women; this is truly a small-sample

Acknowledgments This work is supported in part by

class: we were able to obtain only 50 test examples. Com-

NIH-NINDS EUREKA (R01-NS066311). S.H. is sup-

pared with using VS, reconstruction is noticeably improved

ported in part by the Villum Foundation and the Danish

using our method (VF). In both experiments, results for V

Council for Independent Research (Natural Sciences).

look nearly identical to VF, and are not shown. See [13] for

individual reconstruction results.

References

5.2. Classification Transport and Image Descriptors

[1] V. Arsigny, P. Fillard, X. Pennec, and N. Ayache. Log-

For Task II, our data consist of facial images7 and the

euclidean metrics for fast and simple calculus on diffusion

goal is binary facial-expression classification. Images are

tensors. MR in medicine, 56(2):411–421, 2006. 2

described by SPD matrices that encode normalized corre-

[2] M. F. Beg, M. I. Miller, A. Trouv´e, and L. Younes. Comput-

ing large deformation metric mappings via geodesic flows of

lations of pixel-wise features [40]. Each quarter of an im-

diffeomorphisms. IJCV, 61(2):139–157, 2005. 2

age is described by a 5 ⇥ 5 SPD matrix, yielding an im-

age descriptor in M = SP D(5)4. PT is computable by

8Since SPD is not a matrix Lie group. While not used here, some Lie

group structure can still be imposed to get a nonstandard matrix Lie group;

7From www.wisdom.weizmann.ac.il/˜vision/FaceBase

i.e., the binary operation will not be the matrix product. - Results

• Classiﬁer Transport

– 斜めのアングルで学習した表情識別（2クラス）を正面アン

グルにTransfer

• 斜めアングルの識別器をそのまま利用したケースでは59%の精

度だったものが、Transferありでは67%に改善した

pute, for each triangle, the Squared Geodesic Error (SGE)

between the reconstruction and the true deformation. Fix-

ing i, SGE is averaged over all body triangles, yielding the

Mean SGE (MSGE) of the ith body. Overall performance

of V is defined by averaging MSGE over all test examples.

MSGE results are summarized in Fig. 3 (left). To visualize,

we average the SGE, per triangle, over all test examples,

Figure 7: Classifier-transport example. Select images. Top:

and display these per-triangle errors over the mesh associ-

First data set. Bottom: Second data set. In each row, exam-

ated with µ (Fig. 4). Figure 4a shows that VL performs very

ples from class 1 (left) and class 2 (right) are shown.

poorly; a shape model of women fails to model men. While

the errors for VS are much lower (Fig. 4b), there are still

noticeable errors due to poor generalization. The surprise

Schild’s ladder, M is not a Lie group8 and n = 60. The

is Fig. 4c, which shows the result for V : the PT dramat-

datasets {pi}NA and

reflect two different viewing

i

{qj}NB

j

ically improves the female model (Fig. 4a) to the point it

directions; NA = NB = 168. The labels of {pi}NA are

fares comparably with the male model (Fig. 4b), although

i

known, those of {q

withheld. See Fig. 7 for examples.

the only information used from the male data is the mean.

j }NB

j

We compute p and q, the means of the datasets. Then, us-

Combining transported and local models lets us do even bet-

ing {Log

ter. Figure 4d shows that

p(pi)}NA

V

i

⇢ TpM, we learn a logistic-regression

F significantly improves over VS

model. This classifier, defined on T

or

pM , is correct 59% of

V . Figure 4e shows the regularized model, V , which

the time when applied to

has the same dimensionality as

{Log

V

p(qj )}NB

j

⇢ TpM. Apply-

L and still performs well.

Figure 5 shows selected results for test bodies; see [13] for

ing the transported model to {Logq(qj)}NB

j

⇢ TqM im-

additional results and reconstructions.

proves performance to 67%. Thus, for the same unanno-

From Normal-Weight to Obesity. A good statistical

tated {qj}NB, MT improves over the baseline. Note we had

j

shape model of obese women is important for fashion and

to PT only one vector; even for such a small dataset the

health applications but is difficult to build since the data are

speed gain is already significant.

scarce as reflected by their paucity in existing body shape

datasets [31]. This experiment is similar to the previous

6. Conclusion

one, but both the data and the results are of different na-

Our work is the first to suggest a framework for gener-

ture. Here, we have 1000 shapes of women with BMI 30

alizing transfer learning (TL) to manifold-valued data. As

but only 50 shapes of women with BMI > 30. We com-

is well-known, parallel transport (PT) provides a principled

pute means and subspaces as before. Figure 2c shows q, the

way to move data across a manifold. We follow this rea-

high-BMI mean; p, the normal-BMI mean, is not shown as

soning in our TL tasks, but rather than transporting data we

it is very similar to p from the gender experiment. Figures

transport models – so the cost does not depend on the size of

3 (right) and 6 summarize the results. Compared with the

the data – and show that for many models the approaches are

gender experiment there are two main differences: 1) Here

equivalent. Thus, our framework naturally scales to large

V is already much better than VS so fusion only makes a

datasets. Our experiments show that not only is this math-

small difference. 2) Error bars (Fig. 3, right) are larger than

ematically sound and computationally inexpensive but also

before (Fig. 3, left) due to the limited amount of test data

that in practice it can be useful for modeling real data.

available for high-BMI women; this is truly a small-sample

Acknowledgments This work is supported in part by

class: we were able to obtain only 50 test examples. Com-

NIH-NINDS EUREKA (R01-NS066311). S.H. is sup-

pared with using VS, reconstruction is noticeably improved

ported in part by the Villum Foundation and the Danish

using our method (VF). In both experiments, results for V

Council for Independent Research (Natural Sciences).

look nearly identical to VF, and are not shown. See [13] for

individual reconstruction results.

References

5.2. Classification Transport and Image Descriptors

[1] V. Arsigny, P. Fillard, X. Pennec, and N. Ayache. Log-

For Task II, our data consist of facial images7 and the

euclidean metrics for fast and simple calculus on diffusion

goal is binary facial-expression classification. Images are

tensors. MR in medicine, 56(2):411–421, 2006. 2

described by SPD matrices that encode normalized corre-

[2] M. F. Beg, M. I. Miller, A. Trouv´e, and L. Younes. Comput-

ing large deformation metric mappings via geodesic flows of

lations of pixel-wise features [40]. Each quarter of an im-

diffeomorphisms. IJCV, 61(2):139–157, 2005. 2

age is described by a 5 ⇥ 5 SPD matrix, yielding an im-

age descriptor in M = SP D(5)4. PT is computable by

8Since SPD is not a matrix Lie group. While not used here, some Lie

group structure can still be imposed to get a nonstandard matrix Lie group;

7From www.wisdom.weizmann.ac.il/˜vision/FaceBase

i.e., the binary operation will not be the matrix product. - まとめ

• 同一の多様体に乗っている2つの異なるドメイン間

で、接空間上のモデルのTransportを行う手法が得

られた

• 人体メッシュモデル、Classiﬁerの場合で、元のドメイ

ンの情報を利用してTransport先のドメインでのモデ

ル精度の向上が得られた - 所感

• 「接空間モデル」は現実的でない？

– 例えば線形SVMとかでは、大きな特徴ベクトルをそのまま

学習する場合が（実用上）多い

• 多様体が埋め込まれた大きな線形空間に属するベクトルがモデ

ルとなる

→ 接空間の直交補空間をそのままtransportしてよいかどうか？

– 正例と負例が属する多様体が大きく異なるような場合は

どうなる？

• 全体を合わせるとユークリッド空間全体になってしまうかも

– データの多様体が分かっている前提

• 意味のある多様体がわかっているケースはそんなにあるのかど

うか… - 所感

– どちらかというと、特徴量設計の時点で多様体を定めてし

まうようなケースが想定

• 球面上、SPD、Lie群…

• 効果と応用

– 効果は言うほど大きくないような…

– 学習時はオフラインなんだからデータをtransportすれば

いいじゃん、という気もする

– 上手くハマるような使い方がある？

• オンライン学習

• トラッキング

– subspace trackingとか（Grassmann多様体）