このページは http://www.slideshare.net/sscdotopen/latent-factor-models-for-collaborative-filtering の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

- AIM3 – Scalable Data Analysis and Data

Mining

11 – Latent factor models for Collaborative Filtering

Sebastian Schelter, Christoph Boden, Volker Markl

Fachgebiet Datenbanksysteme und Informationsmanagement

Technische Universität Berlin

http://www.dima.tu-berlin.de/

20.06.2012

DIMA – TU Berlin

1 - Recap: Item-Based Collaborative Filtering

Itembased Collaborative Filtering

• compute pairwise similarities of the columns of

the rating matrix using some similarity measure

• store top 20 to 50 most similar items per item

in the item-similarity matrix

• prediction: use a weighted sum over all items

similar to the unknown item that have been

rated by the current user

s r

j S ( i , u )

ij

uj

p

=

ui

s

j S ( i , u )

ij

20.06.2012

DIMA – TU Berlin

2 - Drawbacks of similarity-based neighborhood

methods

• the assumption that a rating is defined by all the

user's ratings for commonly co-rated items is

hard to justify in general

• lack of bias correction

• every co-rated item is looked at in isolation,

say a movie was similar to „Lord of the Rings“, do

we want each part to of the trilogy to contribute as

a single similar item?

• best choice of similarity measure is based on

experimentation not on mathematical reasons

20.06.2012

DIMA – TU Berlin

3 - Latent factor models

■ Idea

• ratings are deeply influenced by a set of factors that are

very specific to the domain (e.g. amount of action in movies,

complexity of characters)

• these factors are in general not obvious, we might be able to

think of some of them but it's hard to estimate their impact on

the ratings

• the goal is to infer those so called latent factors from the

rating data by using mathematical techniques

20.06.2012

DIMA – TU Berlin

4 - Latent factor models

■ Approach

• users and items are characterized by latent

n f

factors, each user and item is mapped onto

u , m

R

i

j

a latent feature space

• each rating is approximated by the dot

T

product of the user feature vector

r

m u

ij

j

i

and the item feature vector

• prediction of unknown ratings also uses

this dot product

• squared error as a measure of loss

T

r

m u 2

ij

j

i

20.06.2012

DIMA – TU Berlin

5 - Latent factor models

■ Approach

• decomposition of the rating matrix into the product of a user

feature and an item feature matrix

• row in U: vector of a user's affinity to the features

• row in M: vector of an item's relation to the features

• closely related to Singular Value Decomposition which

produces an optimal low-rank optimization of a matrix

MT

R

≈

U

20.06.2012

DIMA – TU Berlin

6 - Latent factor models

■ Properties of the decomposition

• automatically ranks features by their „impact“ on the ratings

• features might not necessarily be intuitively understandable

20.06.2012

DIMA – TU Berlin

7 - Latent factor models

■ Problematic situation with explicit feedback data

• the rating matrix is not only sparse, but partially defined,

missing entries cannot be interpreted as 0 they are just

unknown

• standard decomposition algorithms like Lanczos method for

SVD are not applicable

Solution

• decomposition has to be done using the known ratings only

• find the set of user and item feature vectors that minimizes the

squared error to the known ratings

min

r

m u

U, M

2

T

i, j

j

i

20.06.2012

DIMA – TU Berlin

8 - Latent factor models

■ quality of the decomposition is not measured with respect to

the reconstruction error to the original data, but with

respect to the generalization to unseen data

■ regularization necessary to avoid overfitting

■ model has hyperparameters (regularization, learning rate)

that need to be chosen

■ process: split data into training, test and validation set

□ train model using the training set

□ choose hyperparameters according to performance on the test set

□ evaluate generalization on the validation set

□ ensure that each datapoint is used in each set once

(cross-validation)

20.06.2012

DIMA – TU Berlin

9 - Stochastic Gradient Descent

• add a regularizarion term

min

T

r

m u 2 + λ u

+ m

U, M

i, j

j

i

2

2

i

j

• loop through all ratings in the training set, compute

associated prediction error

T

e

= r m u

ui

ij

j

i

• modify parameters in the opposite direction of the gradient

u

u + γ e m λu

i

i

u, i

j

i

m

m + γ e u λm

j

j

u, i

i

j

• problem: approach is inherently sequential (although recent

research might have unveiled a parallelization technique)

20.06.2012

DIMA – TU Berlin

10 - Alternating Least Squares with

Weighted λ-Regularization

■ Model

• feature matrices are modeled directly by using only

the observed ratings

• add a regularization term to avoid overfitting

• minimize regularized error of:

f U, M =

T

r

m u 2 + λ

n

u

+

n

m

ij

j

i

2

2

u

i

m

j

i

j

Solving technique

• fixing one of the unknown variable to make this a simple

quadratic equation

• rotate between fixing u and m until convergence

(„Alternating Least Squares“)

20.06.2012

DIMA – TU Berlin

11 - ALS-WR is scalable

■ Which properties make this approach scalable?

• all the features in one iteration can be computed

independently of each other

• only a small portion of the data necessary to compute

a feature vector

Parallelization with Map/Reduce

• Computing user feature vectors: the mappers need to send

each user's rating vector and the feature vectors of his/her

rated items to the same reducer

• Computing item feature vectors: the mappers need to send

each item's rating vector and the feature vectors of users who

rated it to the same reducer

20.06.2012

DIMA – TU Berlin

12 - Incorporating biases

■ Problem: explicit feedback data is highly biased

□ some users tend to rate more extreme than others

□ some items tend to get higher ratings than others

■ Solution: explicitly model biases

□ the bias of a rating is model as a combination of the items average

rating, the item bias and the user bias

b

b b

ij

i

j

□ the rating bias can be incorporated into the prediction

T

rˆ b b

m u

ij

i

j

j

i

20.06.2012

DIMA – TU Berlin

13 - Latent factor models

■ implicit feedback data is very different from explicit data!

□ e.g. use the number of clicks on a product page of an online shop

□ the whole matrix is defined!

□ no negative feedback

□ interactions that did not happen produce zero values

□ however we should have only little confidence in these (maybe the user

never had the chance to interact with these items)

□ using standard decomposition techniques like SVD would give us a

decomposition that is biased towards the zero entries, again not

applicable

20.06.2012

DIMA – TU Berlin

14 - Latent factor models

■ Solution for working with implicit data:

weighted matrix factorization

■ create a binary preference matrix P

1

r

0

p

ij

ij

0

r

0

ij

■ each entry in this matrix can be weighted

by a confidence function

□ zero values should get low confidence

c ( i , j ) 1 rij

□ values that are based on a lot of interactions

should get high confidence

■ confidence is incorporated into the model

□ the factorization will ‚prefer‘ more confident values

f U, M

T

=

c ( i , j

p m u 2

)

+ λ

u

+

m

ij

j

i

2

2

i

j

20.06.2012

DIMA – TU Berlin

15 - Sources

• Sarwar et al.: „Item-Based Collaborative Filtering

Recommendation Algorithms“, 2001

• Koren et al.: „Matrix Factorization Techniques for Recommender

Systems“, 2009

• Funk: „Netflix Update: Try This at Home“,

http://sifter.org/~simon/journal/20061211.html, 2006

• Zhou et al.: „Large-scale Parallel Collaborative Filtering for the

Netflix Prize“, 2008

• Hu et al.: „Collaborative Filtering for Implicit Feedback

Datasets“, 2008

20.06.2012

DIMA – TU Berlin

16