このページは http://www.slideshare.net/jamesmcm03/the-gaussian-process-latent-variable-model-gplvm の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

約2年前 (2014/10/23)にアップロードin学び

An introduction to the Gaussian Process Latent Variable Model (GPLVM)

- PCAの最終形態GPLVMの解説12ヶ月前 by 弘毅 露崎
- 010_20160216_Variational Gaussian Process8ヶ月前 by Ha Phuong
- Iclr2016 vaeまとめ約1ヶ月前 by DeepLearningJP2016

- Gaussian Process Latent Variable Model

(GPLVM)

James McMurray

PhD Student

Department of Empirical Inference

11/02/2014 - Outline of talk

Introduction

Why are latent variable models useful?

Definition of a latent variable model

Graphical Model representation

PCA recap

Principal Components Analysis (PCA)

Probabilistic PCA based methods

Probabilistic PCA (PPCA)

Dual PPCA

GPLVM

Examples

Practical points

Variants

Conclusion

Conclusion

References - Why are latent variable models useful?

Data has structure. - Why are latent variable models useful?

Observed high-dimensional data often lies on a

lower-dimensional manifold.

Example “Swiss Roll” dataset - Why are latent variable models useful?

The structure in the data means that we don’t need such high

dimensionality to describe it.

The lower dimensional space is often easier to work with.

Allows for interpolation between observed data points. - Definition of a latent variable model

Assumptions:

Assume that the observed variables actually result from a

smaller set of latent variables.

Assume that the observed variables are independent given the

latent variables.

Differs slightly from dimensionality reduction paradigm which

wishes to find a lower-dimensional embedding in the

high-dimensional space.

With the latent variable model we specify the functional form

of the mapping:

y = g (x ) + ε

where x are the latent variables, y are the observed variables

and ε is noise.

Obtain different latent variable models for different

assumptions on g (x ) and ε - Graphical model representation

Graphical Model example of Latent Variable Model

Taken from Neil Lawrence:

http: // ml. dcs. shef. ac. uk/ gpss/ gpws14/ gp_ gpws14_ session3. pdf - Plan

Introduction

Why are latent variable models useful?

Definition of a latent variable model

Graphical Model representation

PCA recap

Principal Components Analysis (PCA)

Probabilistic PCA based methods

Probabilistic PCA (PPCA)

Dual PPCA

GPLVM

Examples

Practical points

Variants

Conclusion

Conclusion

References - Principal Components Analysis (PCA)

Returns orthogonal dimensions of maximum variance.

Works well if data lies on a plane in the higher dimensional

space.

Linear method (although variants allow non-linear application,

e.g. kernel PCA).

Example application of PCA. Taken from

http: // www. nlpca. org/ pca_ principal_ component_ analysis. html - Plan

Introduction

Why are latent variable models useful?

Definition of a latent variable model

Graphical Model representation

PCA recap

Principal Components Analysis (PCA)

Probabilistic PCA based methods

Probabilistic PCA (PPCA)

Dual PPCA

GPLVM

Examples

Practical points

Variants

Conclusion

Conclusion

References - Probabilistic PCA (PPCA)

A probabilistic version of PCA.

Probabilistic formulation is useful for many reasons:

Allows comparison with other techniques via likelihood

measure.

Facilitates statistical testing.

Allows application of Bayesian methods.

Provides a principled way of handling missing values - via

Expectation Maximization. - PPCA Definition

Consider a set of centered data of n observations and d

dimensions: Y = [y1, . . . , yn]T .

We assume this data has a linear relationship with some

embedded latent space data x

N×D

n. Where Y ∈ R

and

x ∈

N×q

R

.

yn = Wxn + n, where xn is the q-dimensional latent variable

associated with each observation, and W ∈

D×q

R

is the

transformation matrix relating the observed and latent space.

We assume a spherical Gaussian distribution for the noise with

a mean of zero and a covariance of β−1I

Likelihood for an observation yn is:

p (yn|xn, W, β) = N yn|Wxn, β−1I - PPCA Derivation

Marginalise latent variables xn, put a Gaussian prior on W and

solve using maximum likelihood.

The prior used for xn in the integration is a zero mean, unit

covariance Gaussian distribution:

p(xn) = N (xn|0, I)

p(yn|W, β) =

p(yn|xn, W, β)p(xn)dxn

p(yn|W, β) =

N yn|Wxn, β−1I N (xn|0, I)dxn

p(yn|W, β) = N (yn|0, WWT + β−1I)

Assuming i.i.d. data, the likelihood of the full set is the

product of the individual probabilities:

N

p(Y |W, β) =

p(yn|W, β)

n=1 - PPCA Derivation

To calculate that marginalisation step we use the summation

and scaling properties of Gaussians.

Sum of Gaussian variables is Gaussian.

n

n

n

N (µi , σ2i) ∼ N

µi ,

σ2i

i =1

i =1

i =1

Scaling a Gaussian leads to a Gaussian:

w N (µ, σ2) ∼ N (w µ, w 2σ2)

So:

y = Wx + ε , x ∼ N (0, I) , ε ∼ N (0, σ2I)

Wx ∼ N (0, WWT)

Wx + ε ∼ N (0, WWT + σ2I) - PPCA Derivation

Can find a solution for W by maximising the likelihood.

Results in an eigenvalue problem.

Turns out that the closed-form solution for W is achieved

when W spans the principal sub-space of the data1.

Same solution as PCA: Probabilistic PCA

Can it be extended to capture non-linear features?

1Michael E. Tipping and Christopher M. Bishop. “Probabilistic principal

component analysis.” (1997). - Dual PPCA

Similar to previous derivation of PPCA.

But marginalise W and optimise xn.

Same linear-Gaussian relationship between latent variables and

data:

D

p(Y|X, W, β) =

N (yd,:|Wxd,:, β−1I)

d =1

Place a conjugate prior on W:

D

P(W) =

N (wd,:|0, I)

d =1

Resulting marginal likelihood is:

D

P(Y |X , β) =

N (y:,d |0, XXT + β−1I)

d =1 - Dual PPCA

Results in equivalent eigenvalue problem to PPCA.

So what is the benefit?

The eigendecomposition is now done on an N × q instead of a

d × q matrix.

Recall marginal likelihood:

D

P(Y |X , β) =

N (y:,d |0, XXT + β−1I)

d =1

The covariance matrix is a covariance function:

K = XXT + β−1I

This linear kernel can be replaced by other covariance

functions for non-linearity.

This is the GPLVM. - GPLVM

Each dimension of the marginal distribution can be interpreted

as an independent Gaussian Process2.

Dual PPCA is the special case where the output dimensions

are assumed to be linear, independent and identically

distributed.

GPLVM removes assumption of linearity.

Gaussian prior over the function space.

Choice of covariance function changes family of functions

considered.

Popular kernels:

Exponentiated Quadratic (RBF) kernel

Matern kernels

Periodic kernels

Many more...

2Neil Lawrence: “Probabilistic non-linear principal component analysis with

Gaussian process latent variable models.” JMLR (2005) - Plan

Introduction

Why are latent variable models useful?

Definition of a latent variable model

Graphical Model representation

PCA recap

Principal Components Analysis (PCA)

Probabilistic PCA based methods

Probabilistic PCA (PPCA)

Dual PPCA

GPLVM

Examples

Practical points

Variants

Conclusion

Conclusion

References - Example: Frey Face data

Example in GPMat - Example: Motion Capture data

Taken from Keith Grochow, et al. “Style-based inverse kinematics.” ACM

Transactions on Graphics (TOG). Vol. 23. No. 3. ACM, 2004.

Introduction

Why are latent variable models useful?

Definition of a latent variable model

Graphical Model representation

PCA recap

Principal Components Analysis (PCA)

Probabilistic PCA based methods

Probabilistic PCA (PPCA)

Dual PPCA

GPLVM

Examples

Practical points

Variants

Conclusion

Conclusion

References- Practical points

Need to optimise over non-convex objective function.

Achieved using gradient-based methods (scaled conjugate

gradients).

Several restarts to attempt to avoid local optima.

Cannot guarantee global optimum.

High computational cost for large datasets.

May need to optimise over most-informative subset of data,

the “active set” for sparsification. - Practical points

Initialisation can have a large effect on the final results.

Effect of poor initialisation on Swiss Roll dataset. PCA left, Isomap right. Taken from

“Probabilistic non-linear principal component analysis with Gaussian process latent

variable models.”, Neil Lawrence, JMLR (2005).

Introduction

Why are latent variable models useful?

Definition of a latent variable model

Graphical Model representation

PCA recap

Principal Components Analysis (PCA)

Probabilistic PCA based methods

Probabilistic PCA (PPCA)

Dual PPCA

GPLVM

Examples

Practical points

Variants

Conclusion

Conclusion

References- Variants

There are a number of variants of the GPLVM.

For example, the GPLVM uses the same covariance function

for each output dimension.

This can be changed, for example the Scaled GPLVM which

introduces a scaling parameter for each output dimension3.

The Gaussian Process Dynamic Model (GPDM) adds another

Gaussian process for dynamical mappings4.

The Bayesian GPLVM approximates integrating over both the

latent variables and the mapping function5.

3Keith Grochow, et al. “Style-based inverse kinematics.” ACM Transactions

on Graphics (TOG). Vol. 23. No. 3. ACM, 2004.

4Wang, Jack, Aaron Hertzmann, and David M. Blei. “Gaussian process

dynamical models.” Advances in neural information processing systems. 2005.

5Titsias, Michalis, and Neil Lawrence. “Bayesian Gaussian process latent

variable model.” (2010). - Variants

The Shared GPLVM learns mappings from a shared latent

space to two separate observational spaces.

Used by Disney Research in their paper “Animating

Non-Humanoid Characters with Human Motion Data” for

generating animations for non-human characters from human

motion capture data.

Shared GPLVM mappings as used by Disney Research

Video - Variants

Can also put a Gaussian Process prior on X to produce Deep

Gaussian Processes6.

Zhang et al. developed Invariant GPLVM7 - permits

interpretation of causal relations between observed variables,

by allowing arbitrary noise correlations between the latent

variables.

Currently attempting to implement IGPLVM in GPy.

6Damianou, Andreas C., and Neil D. Lawrence. ”Deep Gaussian Processes.”

arXiv preprint arXiv:1211.0358 (2012).

7Zhang, K., Sch¨olkopf, B., and Janzing, D. (2010). “Invariant Gaussian

Process Latent Variable Models and Application in Causal Discovery”. UAI

2010.

Introduction

Why are latent variable models useful?

Definition of a latent variable model

Graphical Model representation

PCA recap

Principal Components Analysis (PCA)

Probabilistic PCA based methods

Probabilistic PCA (PPCA)

Dual PPCA

GPLVM

Examples

Practical points

Variants

Conclusion

Conclusion

References- Conclusion

Implemented in GPy (Python) and GPMat (MATLAB).

Many practical applications - pose modelling, tweening

Especially if smooth interpolation is desireable.

Modeling of confounders.

Thanks for your time

Questions? - References

Neil Lawrence, ”Gaussian process latent variable models for

visualisation of high dimensional data.” Advances in neural

information processing systems 16.329-336 (2004): 3.

Neil Lawrence, “Probabilistic non-linear principal component

analysis wit Gaussian process latent variable models.” JMLR

(2005)

Gaussian Process Winter School, Sheffield 2013:

http://ml.dcs.shef.ac.uk/gpss/gpws14/

WikiCourseNote: http://wikicoursenote.com/wiki/

Probabilistic_PCA_with_GPLVM