このページは http://www.slideshare.net/DaichiKitamura/efficient-initialization-for-nonnegative-matrix-factorization-based-on-nonnegative-independent-component-analysis の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

約1ヶ月前 (2016/09/16)にアップロードin学び

Daichi Kitamura, Nobutaka Ono, "Efficient initialization for nonnegative matrix factorization bas...

Daichi Kitamura, Nobutaka Ono, "Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis," The 15th International Workshop on Acoustic Signal Enhancement (IWAENC 2016), Xi’an, China, September 2016.

- 独立性基準を用いた非負値行列因子分解の効果的な初期値決定法（Statistical-independence-based efficient initialization for nonnegative matrix factorization）8ヶ月前 by Daichi Kitamura
- 音響メディア信号処理における独立成分分析の発展と応用, History of independent component analysis for sound media signal processing and its applications9ヶ月前 by Daichi Kitamura
- 非負値行列分解の確率的生成モデルと多チャネル音源分離への応用 (Generative model in nonnegative matrix factorization and its application to multichannel sound source separation)11ヶ月前 by Daichi Kitamura

- 独立性基準を用いた非負値行列因子分解の効果的な初期値決定法（Statistical-independence-based efficient initialization for nonnegative matrix factorization）8ヶ月前 by Daichi Kitamura
- 非負値行列分解の確率的生成モデルと多チャネル音源分離への応用 (Generative model in nonnegative matrix factorization and its application to multichannel sound source separation)11ヶ月前 by Daichi Kitamura
- Deep learning for image denoising and superresolution約3年前 by Yu Huang

- IWAENC 2016, Sept. 16, 08:30 - 10:30,

Session SPS-II - Student paper competition 2

SPC-II-04

Efficient initialization for NMF

based on nonnegative ICA

Daichi Kitamura (SOKENDAI, Japan)

Nobutaka Ono (NII/SOKENDAI, Japan) - Research background: what is NMF?

• Nonnegative matrix factorization (NMF) [Lee, 1999]

– Dimensionality reduction with nonnegative constraint

– Unsupervised learning extracting meaningful features

– Sparse decomposition (implicitly)

plitude

Am

Time

Frequency

Frequency

Activation matrix

Time

Amplitude

(time-varying gains)

Input data matrix

Basis matrix

: # of rows

(power spectrogram)

(spectral patterns)

: # of columns

: # of bases

2/19 - Research background: how to optimize?

• Optimization in NMF

– Define a cost function (data fidelity) and minimize it

– No closed-form solution for and

– Efficient iterative optimization

• Multiplicative update rules (auxiliary function technique) [Lee, 2001]

(when the cost function is a squared Euclidian distance)

3/19

– Initial values for al the variables are required. - Problem and motivation

• Results of all applications using NMF always depend

the initialization of and .

– Ex. source separation via ful -supervised NMF [Smaragdis, 2007]

Good

] 12

10

More

ent [dB 8

than 1 dB

6

provem 4

im

Different

R 2

D

random seeds

Poor

S 0

nd6

and1

and2

and3

and4

and5

a

and7

and8

and9

and10

R

R

R

R

R

R

R

R

R

R

4/19

• Motivation: Initialization method that always gives

us a good performance is desired. - Conventional NMF initialization techniques

• With random values (not focused here)

– Directly use random values

– Search good values via genetic algorithm [Stadlthanner, 2006], [Janecek, 2011]

– Clustering-based initialization [Zheng, 2007], [Xue, 2008], [Rezaei, 2011]

• Cluster input data into clusters, and set the centroid vectors to init

ial basis vectors.

• Without random values

– PCA-based initialization [Zhao, 2014]

• Apply PCA to input data , extract orthogonal bases and coefficients,

and set their absolute values to the initial bases and activations.

– SVD-based initialization [Boutsidis, 2008]

• Apply a special SVD (nonnegative double SVD) to input data and se

t nonnegative left and right singular vectors to the initial values.

5/19 - Bases orthogonality?

• Are orthogonal bases really better for NMF?

– PCA and SVD are orthogonal decompositions.

– A geometric interpretation of NMF [Donoho, 2003]

• The optimal bases in NMF are “along the edges of a convex cone” th

at includes al the observed data points.

Meaningless areas

Convex cone

Data points

Edge

Optimal bases

Orthogonal bases

Tight bases

satisfactory for representing

have a risk to represent

cannot represent all

all the data points

a meaningless area

the data points

6/19

– Orthogonality might not be a good initial value for NMF. - Proposed method: utilization of ICA

• What can we do from only the input data ?

– Independent component analysis (ICA) [Comon, 1994]

– ICA extracts non-orthogonal bases

• that maximize a statistical independence between sources.

– ICA estimates sparse sources

• when we assume a super-Gaussian prior.

• Propose to use ICA bases and estimated sources a

s initial NMF values

– Objectives:

F

M

• 1. Deeper minimization

Deeper

• 2. Faster convergence

minimization

• 3. Better performance

alue of cost

V

function in N

Faster

convergence

Number of update iterations in NMF

7/19 - Proposed method: concept

• The input data matrix is a mixture of some sources.

– sources in are mixed via , then observed as

Input data matrix

Mixing matrix

Source matrix

PCA matrix for

dimensionality reduction

…

I …

CA

…

Mutually

bases

independent

– ICA can estimate a demixing matrix and the indep

endent sources .

Input data

matrix

NMF

Initial values

PCA

NICA

Nonnegativization

• PCA for only the dimensionality reduction in NMF 8/19

• Nonnegative ICA for taking nonnegativity into account

• Nonnegativization for ensuring complete nonnegativity - Nonnegative constrained ICA

• Nonnegative ICA (NICA) [Plumbley, 2003]

– estimates demixing matrix so that al of the separated sourc

es become nonnegative.

– finds rotation matrix for pre-whitened mixtures .

Observed

Pre-whitened

Separated

Whitening w/o

Rotation

centering

(demixing)

– Steepest gradient descent for estimating

Cost function:

where

9/19 - Combine PCA for dimensionality reduction

• Dimensionality reduction via PCA

ICA bases Sources

High

s

e

lu

va

n

ige

Rows are eigenvectors of

E

Low

has top- eigenvectors

Zero matrix

• NMF variables obtained from the estimates of NICA

Rotation matrix

– Support that , estimated by NICA

– then we have

Basis

matrix

Activation

matrix

10/19 - Nonnegativization

• Even if we use NICA, there is no guarantee that

– obtained (sources) becomes completely nonnegative bec

ause of the dimensionality reduction by PCA.

– As for the obtained basis (ICA bases), nonnegativity is no

t assumed in NICA.

• Take a “nonnegativization” for obtained and :

– Method 1:

Correlation between

– Method 2:

and

– Method 3:

Correlation between

and

• where and are scale fitting coefficient that depend on a diver

gence of fol owing NMF

11/19 - Experiment: conditions

• Power spectrogram of mixture with Vo. and Gt.

– Song: “Actions – One Minute Smile” from SiSEC2015

– Size of power spectrogram: 2049 x 1290 (60 sec.)

– Number of bases:

z]

uency [kH

req

F

Time [s]

12/19 - Experiment: results of NICA

• Convergence of cost function in NICA

0.6

A 0.5

IC

Steepest gradient descent

0.4

0.3

0.2

alue of cost function in N 0.1

V

0.0 0

500

1000

1500

2000

Number of iterations

13/19 - Experiment: results of Euclidian NMF

• Convergence of EU-NMF

Rand1~10 are based on random

initialization with different seeds.

10

10

NICA1

NICA2

NICA3

PCA-based initialization

NNDSVD

9

Rand1~Rand10

Processing time

F

for initialization

M

-N

NICA: 4.36 s

8

PCA: 0.98 s

U

SVD: 2.40 s

Rand1~Rand

EU-NMF: 12.78 s (for 1000 iter.)

10

7

SVD

6

ost function in E

PCA

C

Proposed methods

50

200

400

600

800

1000

Number of iterations

14/19 - Experiment: results of Kullback-Leibler NMF

• Convergence of KL-NMF

Rand1~10 are based on random

initialization with different seeds.

7

10

NICA1

NICA2

NICA3

PCA-based initialization

NNDSVD

Rand1~Rand10

Processing time

F

M

for initialization

NICA: 4.36 s

L-N

PCA: 0.98 s

S

SVD: 2.40 s

V

KL-NMF: 48.07 s (for 1000 iter.)

9

D

PCA

Rand1~Rand10

ost function in K

C

Proposed methods

80

200

400

600

800

1000

Number of iterations

15/19 - Experiment: results of Itakura-Saito NMF

• Convergence of IS-NMF

Rand1~10 are based on random

initialization with different seeds.

1.70

x106

NICA1

NICA2

NICA3

PCA-based initialization

NNDSVD

Rand1~Rand10

Processing time

F 1.65

M

for initialization

-N

NICA: 4.36 s

PCA: 0.98 s

1.60

SVD: 2.40 s

IS-NMF: 214.26 s (for 1000 iter.)

1.55

SVD

PCA

ost function in IS

C 1.50

Rand1~Rand10

Proposed methods

1.450

200

400

600

800

1000

Number of iterations

16/19 - Experiment: full-supervised source separation

• Full-supervised NMF [Smaragdis, 2007]

– Simply use pre-trained sourcewise bases for separation

Training stage

Initialized by

conventional or proposed

,

method

Cost functions:

Pre-trained bases

Separation stage (fixed)

Initialized based on the

correlations between

and or

Cost function:

17/19 - Experiment: results of separation

• Two sources separation using ful -supervised NMF

– SiSEC2015 MUS dataset (professional y recorded music)

– Averaged SDR improvements of 15 songs

12

5

]

Prop.

Conv.

]

Prop.

Conv.

10

4

ent [dB 8

ent [dB 3

6

provem

provem 2

4

im

im

R

R

D 2

1

D

S

S

0

1

2

3

0

1

2

3

A

A

A

A

D

A

D

C

V

A

A

A

C

V

IC

IC

IC

P

S

and1

and2

and3

and4

and5

and6

and7

and8

and9

P

S

and10

IC

IC

IC

and1

and2

and3

and4

and5

and6

and7

and8

and9

N

N

N

R

R

R

R

R

R

R

R

R

and10

R

N

N

N

R

R

R

R

R

R

R

R

R

R

Separation performance for source 1

Separation performance for source 2

18/19 - Conclusion

• Proposed efficient initialization method for NMF

• Utilize statistical independence for obtaining non-or

thogonal bases and sources

– The orthogonality may not be preferable for NMF.

• The proposed initialization gives

– deeper minimization

– faster convergence

– better performance for ful -supervised source separation

Thank you for your attention!

19/19