このページは http://www.slideshare.net/xamat/kdd-2014-tutorial-the-recommender-problem-revisited の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

約2年前 (2014/08/22)にアップロードinテクノロジー

Slides for my tutorial in KDD 2014

- @xamat

The Recommender

Problem

Revisited

Xavier Amatriain

Bamshad Mobasher

Research/Engineering Director @

Professor @

Netflix

DePaul University

Xavier Amatriain – August 2014 – KDD - Index

PART I - Xavier Amatriain (2h)

1. The Recommender Problem

2. Traditional Recommendation Methods

3. Beyond Traditional Methods

PART 2 - Bamshad Mobasher (1h)

1. Context-aware Recommendations

Xavier Amatriain – August 2014 – KDD - Recent/Upcoming Publications

●

The Recommender Problem Revisited. KDD and Recsys 2014 Tutorial

●

KDD: Big & Personal: data and models behind Netflix recommendations. 2013

●

SIGKDD Explorations: Mining large streams of user data for personalized recommendations. 2012

●

Recsys: Building industrial-scale real-world recommender systems. 2012

●

Recys - Walk the Talk: Analyzing the relation between implicit and explicit feedback for preference elicitation. 2011

●

SIGIR – Temporal behavior of CF. 2010

●

Web Intelligence – Expert-based CF for music. 2010

●

Recsys – Tensor Factorization. 2010

●

Mobile HCI – Tourist Recommendation. 2010

●

Recsys Handbook (book) – Data mining for recsys. 2010 & Recommender Systems in Industry. 2014

●

SIGIR – Wisdom of the Few. 2009

●

Recsys – Denoising by re-rating. 2009

●

CARS – Implicit context-aware recommendations. 2009

●

UMAP – I like it I like it not. 2009

Xavier Amatriain – August 2014 – KDD - Index

1. The Recommender Problem

2. Traditional Recommendation Methods

2.1. Collaborative Filtering

2.2. Content-based Recommendations

2.3. Hybrid Approaches

3. Beyond Traditional Methods

3.1. Learning to Rank

3.2. Similarity

3.3. Deep Learning

3.4. Social Recommendations

3.5. Page Optimization

3.6. Tensor Factorization and Factorization Machines

3.7. MAB Explore/Exploit

4. References

Xavier Amatriain – August 2014 – KDD - 1. The Recommender

Problem

Xavier Amatriain – August 2014 – KDD - The Age of Search has come to an end

• ... long live the Age of Recommendation!

• Chris Anderson in “The Long Tail”

• “We are leaving the age of information and entering the age

of recommendation”

• CNN Money, “The race to create a 'smart' Google”:

• “The Web, they say, is leaving the era of search and

entering one of discovery. What's the difference? Search is

what you do when you're looking for something. Discovery

is when something wonderful that you didn't know existed,

or didn't know how to ask for, finds you.”

Xavier Amatriain – August 2014 – KDD - Information overload

“People read around 10 MB worth of material a day, hear 400 MB a

day, and see 1 MB of information every second” - The Economist, November 2006

In 2015, consumption will raise to 74 GB a day - UCSD Study 2014

Xavier Amatriain – August 2014 – KDD - Everything is personalized

Xavier Amatriain – August 2014 – KDD - The “Recommender problem”

● Traditional definition: Estimate a utility

function that automatically predicts how a

user will like an item.

● Based on:

○ Past behavior

○ Relations to other users

○ Item similarity

○ Context

○ …

Xavier Amatriain – August 2014 – KDD - Recommendation as data mining

The core of the

Recommendation

Engine can be

assimilated to a general

data mining problem

(Amatriain et al. Data Mining Methods for

Recommender Systems in Recommender

Systems Handbook)

Xavier Amatriain – August 2014 – KDD - Machine Learning + all those other

things

● User Interface

● System requirements (efficiency, scalability,

privacy....)

● Serendipity

● Diversity

● Awareness

● Explanations

● …

Xavier Amatriain – August 2014 – KDD - Serendipity

● Unsought finding

● Don't recommend items the user already knows

or would have found anyway.

● Expand the user's taste into neighboring areas

by improving the obvious

● Collaborative filtering can offer controllable

serendipity (e.g. controlling how many

neighbors to use in the recommendation)

Xavier Amatriain – August 2014 – KDD - Social Support

Explanation/Support for Recommendations

Xavier Amatriain – August 2014 – KDD - Diversity & Awareness

Personalization awareness

All

Dad

Dad&Mom Daughter

All

All?

Daughter

Son

Mom

Mom

Diversity

Xavier Amatriain – August 2014 – KDD - What works

● Depends on the domain and particular problem

● However, in the general case it has been

demonstrated that the best isolated approach is CF.

○ Other approaches can be hybridized to improve

results in specific cases (cold-start problem...)

● What matters:

○ Data preprocessing: outlier removal, denoising,

removal of global effects (e.g. individual user's

average)

○ “Smart” dimensionality reduction using MF/SVD

○ Combining methods through ensembles

Xavier Amatriain – August 2014 – KDD - Evolution of the Recommender Problem

Context

4.7

Rating

Ranking

Page Optimization

Context-aware

Recommendations

Xavier Amatriain – August 2014 – KDD - Index

1. The Recommender Problem

2. “Traditional” Methods

2.1. Collaborative Filtering

2.2. Content-based Recommendations

2.3. Hybrid Approaches

3. Beyond Traditional Methods

3.1. Learning to Rank

3.2. Similarity

3.3. Deep Learning

3.4. Social Recommendations

3.5. Page Optimization

3.6. Tensor Factorization and Factorization Machines

3.7. MAB Explore/Exploit

4. References

Xavier Amatriain – August 2014 – KDD - 2. Traditional Approaches

Xavier Amatriain – August 2014 – KDD - 2.1. Collaborative Filtering

Xavier Amatriain – August 2014 – KDD - The CF Ingredients

● List of m Users and a list of n Items

● Each user has a list of items with associated opinion

○ Explicit opinion - a rating score

○ Sometime the rating is implicitly – purchase records

or listen to tracks

● Active user for whom the CF prediction task is

performed

● Metric for measuring similarity between users

● Method for selecting a subset of neighbors

● Method for predicting a rating for items not currently

rated by the active user.

Xavier Amatriain – August 2014 – KDD - Personalized vs Non-Personalised CF

● CF recommendations are personalized:

prediction based only on similar users

● Non-personalized collaborative-based

recommendation: averagge the

recommendations of ALL the users

● How would the two approaches compare?

Xavier Amatriain – August 2014 – KDD - Personalized vs. Not Personalized

● Netflix Prize: it is very

simple to produce

“reasonable”

recommendations and

extremely difficult to

improve them to become

“great”

● But there is a huge

difference in business

MAE

value between reasonable

MAE

total

Data Set users items

density

Non

ratings

Pers

and great

Pers

Jester

48483

100 3519449 0,725

0,220

0,152

MovieLens 6040

3952 1000209 0,041

0,233

0,179

EachMovie 74424 1649 2811718 0,022

0,223

0,151

Xavier Amatriain – August 2014 – KDD - User-based Collaborative Filtering

Xavier Amatriain – August 2014 – KDD - Collaborative Filtering

The basic steps:

1. Identify set of ratings for the target/active user

2. Identify set of users most similar to the target/active user

according to a similarity function (neighborhood

formation)

3. Identify the products these similar users liked

4. Generate a prediction - rating that would be given by the

target user to the product - for each one of these products

5. Based on this predicted rating recommend a set of top N

products

Xavier Amatriain – August 2014 – KDD - UB Collaborative Filtering

● A collection of user u , i=1, …n and a collection

i

of products p , j=1, …, m

j

● An n × m matrix of ratings v , with v = ? if user

ij

ij

i did not rate product j

● Prediction for user i and product j is computed

as

or

• Similarity can be computed by Pearson correlation

or

Xavier Amatriain – August 2014 – KDD - User-based CF

Example

Xavier Amatriain – August 2014 – KDD - User-based CF

Example

Xavier Amatriain – August 2014 – KDD - User-based CF

Example

Xavier Amatriain – August 2014 – KDD - User-based CF

Example

Xavier Amatriain – August 2014 – KDD - User-based CF

Example

Xavier Amatriain – August 2014 – KDD - Item-based Collaborative Filtering

Xavier Amatriain – August 2014 – KDD - Item Based CF Algorithm

● Look into the items the target user has rated

● Compute how similar they are to the target item

○ Similarity only using past ratings from other

users!

● Select k most similar items.

● Compute Prediction by taking weighted average

on the target user’s ratings on the most similar

items.

Xavier Amatriain – August 2014 – KDD - Item Similarity Computation

● Similarity: find users who have rated items and

apply a similarity function to their ratings.

● Cosine-based Similarity (difference in rating scale

between users is not taken into account)

● Adjusted Cosine Similarity (takes care of difference

in rating scale)

Xavier Amatriain – August 2014 – KDD - Challenges of memory-based CF

Xavier Amatriain – August 2014 – KDD - CF: Pros/Cons

● Requires minimal knowledge engineering efforts

● Users and products are symbols without any internal

structure or characteristics

● Produces good-enough results in most cases

Challenges:

● Sparsity – evaluation of large itemsets where user/item

interactions are under 1%.

● Scalability - Nearest neighbor require computation that

grows with both the number of users and the number of

items.

Xavier Amatriain – August 2014 – KDD - The Sparsity Problem

● Typically: large product sets, user

ratings for a small percentage of them

(e.g. Amazon: millions of books and a user may

have bought hundreds of books)

● If you represent the Netflix Prize

rating data in a User/Movie matrix you

get…

○

500,000 x 17,000 = 8.5 B positions

○

Out of which only 100M are non-zero

● Number of users needs to be ~ 0.1 x

size of the catalog

Xavier Amatriain – August 2014 – KDD - Model-based

Collaborative Filtering

Xavier Amatriain – August 2014 – KDD - Model Based CF Algorithms

● Memory based

○ Use the entire user-item database to generate a

prediction.

○ Usage of statistical techniques to find the neighbors – e.g.

nearest-neighbor.

● Model-based

○ First develop a model of user

○ Type of model:

■

Probabilistic (e.g. Bayesian Network)

■

Clustering

■

Rule-based approaches (e.g. Association Rules)

■

Classification

■

Regression

■

LDA

■

...

Xavier Amatriain – August 2014 – KDD - Model-based CF:

What we learned from the

Netflix Prize

Xavier Amatriain – August 2014 – KDD - What we were interested in:

■ High quality recommendations

Proxy question:

■ Accuracy in predicted rating

■ Improve by 10% = $1million!

Xavier Amatriain – August 2014 – KDD - 2007 Progress Prize

▪ Top 2 algorithms

▪ SVD - Prize RMSE: 0.8914

▪ RBM - Prize RMSE: 0.8990

▪ Linear blend Prize RMSE: 0.88

▪ Currently in use as part of Netflix’ rating prediction

component

▪ Limitations

▪ Designed for 100M ratings, we have 5B ratings

▪ Not adaptable as users add ratings

▪ Performance issues

Xavier Amatriain – August 2014 – KDD - SVD/MF

● X: m x n matrix (e.g., m users, n videos)

● U: m x r matrix (m users, r factors)

● S: r x r diagonal matrix (strength of each ‘factor’) (r: rank of the

matrix)

● V: r x n matrix (n videos, r factor)

Xavier Amatriain – August 2014 – KDD - Simon Funk’s SVD

● One of the most

interesting findings

during the Netflix

Prize came out of

a blog post

● Incremental,

iterative, and

approximate way

to compute the

SVD using

gradient descent

Xavier Amatriain – August 2014 – KDD - SVD for Rating Prediction

▪ User factor vectors and item-factors vector

▪ Baseline (bias) (user & item deviation from average)

▪ Predict rating as

▪ Asymetric SVD (Koren et. Al) asymmetric variation w. implicit

feedback

▪ Where

▪ are three item factor vectors

▪ Users are not parametrized, but rather represented by:

▪ R(u): items rated by user u

▪ N(u): items for which the user has given implicit preference (e.g. rated vs. not rated)

Xavier Amatriain – August 2014 – KDD - Restricted Boltzmann Machines

● Each unit is a state that can be active or not active

● Each input to a unit is associated to a weight

● The transfer function calculates a score for every unit

based on the weighted sum of inputs

● Score is passed to the activation function that calculates

the probability of the unit to be active

● Restrict the connectivity to make learning easier.

Only one layer of hidden units.

No connections between hidden units.

Hidden units are independent given visible states

Xavier Amatriain – August 2014 – KDD - RBM for Recommendations

● Each visible unit = an item

● Num. of hidden units a is parameter

● In training phase, for each user:

○

If user rated item, v is activated

i

○

Activation states of v = inputs to h

i

j

○

Based on activation, h is computed

j

○

Activation state of h becomes input to v

j

i

○

Activation state of v is recalculated

i

○

Difference between current and past

activation state for v used to update weights

i

w and thresholds

ij

●

In prediction phase:

○

For the items of the user the v are activated

i

○

Based on this the state of the h is computed

j

○

The activation of h is used as input to

j

recompute the state of vi

○

Activation probabilities are used to

recommend items

Xavier Amatriain – August 2014 – KDD - Putting all together

● Remember that current production

model includes an ensemble of both

SVD++ and RBMs

Xavier Amatriain – August 2014 – KDD - What about the final prize

ensembles?

● Our offline studies showed they were too

computationally intensive to scale

● Expected improvement not worth the engineering

effort

● Plus…. Focus had already shifted to other issues

that had more impact than rating prediction.

Xavier Amatriain – August 2014 – KDD - Clustering

Xavier Amatriain – August 2014 – KDD - Clustering

● Goal: cluster users and compute per-cluster

“typical” preferences

● Users receive recommendations computed at

the cluster level

Xavier Amatriain – August 2014 – KDD - Locality-sensitive Hashing (LSH)

● Method for grouping similar items in highly

dimensional spaces

● Find a hashing function s.t. similar items are

grouped in the same buckets

● Main application is Nearest-neighbors

○ Hashing function is found iteratively by

concatenating random hashing functions

○ Addresses one of NN main concerns:

performance

Xavier Amatriain – August 2014 – KDD - Other “interesting” clustering

techniques

● k-means and all its variations

● Affinity Propagation

● Spectral Clustering

● Non-parametric Bayesian Clustering (e.g.

HDPs)

Xavier Amatriain – August 2014 – KDD - Association Rules

Xavier Amatriain – August 2014 – KDD - Association rules

● Past purchases are interpreted as transactions of “associated”

items

● If a visitor has some interest in Book 5, she will be

recommended to buy Book 3 as well

● Recommendations are constrained to some minimum levels

of confidence

● Fast to implement and execute (e.g. A Priori algorithm)

Xavier Amatriain – August 2014 – KDD - Classifiers

Xavier Amatriain – August 2014 – KDD - Classifiers

● Classifiers are general computational models trained

using positive and negative examples

● They may take in inputs:

○ Vector of item features (action / adventure, Bruce

Willis)

○ Preferences of customers (like action / adventure)

○ Relations among item

● E.g. Logistic Regression, Bayesian Networks,

Support Vector Machines, Decision Trees, etc...

Xavier Amatriain – August 2014 – KDD - Classifiers

● Classifiers can be used in CF and CB

Recommenders

● Pros:

○ Versatile

○ Can be combined with other methods to improve accuracy

of recommendations

● Cons:

○ Need a relevant training set

○ May overfit (Regularization)

● E.g. Logistic Regression, Bayesian Networks,

Support Vector Machines, Decision Trees, etc...

Xavier Amatriain – August 2014 – KDD - Limitations of

Collaborative Filtering

Xavier Amatriain – August 2014 – KDD - Limitations of Collaborative Filtering

● Cold Start: There needs to be enough other users

already in the system to find a match. New items

need to get enough ratings.

● Popularity Bias: Hard to recommend items to

someone with unique tastes.

○ Tends to recommend popular items (items from

the tail do not get so much data)

Xavier Amatriain – August 2014 – KDD - Index

1. The Recommender Problem

2. “Traditional” Methods

2.1. Collaborative Filtering

2.2. Content-based Recommendations

2.3. Hybrid Approaches

3. Beyond Novel Methods

3.1. Learning to Rank

3.2. Similarity

3.3. Deep Learning

3.4. Social Recommendations

3.5. Page Optimization

3.6. Tensor Factorization and Factorization Machines

3.7. MAB Explore/Exploit

4. References

Xavier Amatriain – August 2014 – KDD - 2.2 Content-based Recommenders

Xavier Amatriain – August 2014 – KDD - Content-Based Recommendation

● Recommendations based on content of items rather than on

other users’ opinions/interactions

● Goal: recommend items similar to those the user liked

● Common for recommending text-based products (web

pages, usenet news messages, )

● Items to recommend are “described” by their associated

features (e.g. keywords)

● User Model structured in a “similar” way as the content:

features/keywords more likely to occur in the preferred

documents (lazy approach)

● The user model can be a classifier based on whatever

technique (Neural Networks, Naïve Bayes...)

Xavier Amatriain – August 2014 – KDD - Pros/cons of CB Approach

Pros

● No need for data on other users: No cold-start or sparsity

● Able to recommend to users with unique tastes.

● Able to recommend new and unpopular items

● Can provide explanations by listing content-features

Cons

● Requires content that can be encoded as meaningful features

(difficult in some domains/catalogs)

● Users represented as learnable function of content features.

● Difficult to implement serendipity

● Easy to overfit (e.g. for a user with few data points)

Xavier Amatriain – August 2014 – KDD - A word of caution

Xavier Amatriain – August 2014 – KDD - Index

1. The Recommender Problem

2. “Traditional” Methods

2.1. Collaborative Filtering

2.2. Content-based Recommendations

2.3. Hybrid Approaches

3. Beyond Traditional Methods

3.1. Learning to Rank

3.2. Similarity

3.3. Deep Learning

3.4. Social Recommendations

3.5. Page Optimization

3.6. Tensor Factorization and Factorization Machines

3.7. MAB Explore/Exploit

4. References

Xavier Amatriain – August 2014 – KDD - 2.3 Hybrid Approaches

Xavier Amatriain – August 2014 – KDD - Comparison of methods (FAB

system)

• Content–based

recommendation with

Bayesian classifier

• Collaborative is

standard using

Pearson correlation

• Collaboration via

content uses the

content-based user

profiles

Averaged on 44 users

Precision computed in top 3 recommendations

Xavier Amatriain – August 2014 – KDD - Hybridization Methods

Hybridization Method

Description

Weighted

Outputs from several techniques (in the form of

scores or votes) are combined with different

degrees of importance to offer final

recommendations

Switching

Depending on situation, the system changes from

one technique to another

Mixed

Recommendations from several techniques are

presented at the same time

Feature combination

Features from different recommendation sources

are combined as input to a single technique

Cascade

The output from one technique is used as input of

another that refines the result

Feature augmentation

The output from one technique is used as input

features to another

Meta-level

The model learned by one recommender is used

as input to another

Xavier Amatriain – August 2014 – KDD - Index

1. The Recommender Problem

2. “Traditional” Methods

2.1. Collaborative Filtering

2.2. Content-based Recommendations

2.3. Hybrid Approaches

3. Beyond Traditional Methods

3.1. Learning to Rank

3.2. Similarity

3.3. Deep Learning

3.4. Social Recommendations

3.5. Page Optimization

3.6. Tensor Factorization and Factorization Machines

3.7. MAB Explore/Exploit

4. References

Xavier Amatriain – August 2014 – KDD - 3. Beyond traditional approaches to

Recommendation

Xavier Amatriain – August 2014 – KDD - 3.1 Ranking

Xavier Amatriain – August 2014 – KDD - Ranking

Key algorithm, sorts titles in most contexts

Ranking

Xavier Amatriain – August 2014 – KDD - Ranking

● Most recommendations are presented in a sorted

list

● Recommendation can be understood as a ranking

problem

● Popularity is the obvious baseline

● Ratings prediction is a clear secondary data input

that allows for personalization

● Many other features can be added

Xavier Amatriain – August 2014 – KDD - Ranking by ratings

4.7

4.6

4.5

4.5

4.5

4.5

4.5

4.5

4.5

4.5

Niche titles

High average ratings… by those who would watch it

Xavier Amatriain – August 2014 – KDD - RMSE

Xavier Amatriain – August 2014 – KDD - Example: Two features, linear model

1

2

Final Ranking

3

4

Linear Model:

f

(u,v) = w p(v) + w r(u,v) + b

rank

1

2

Predicted Rating

5

Popularity - Example: Two features, linear model

1

2

Final Ranking

3

4

Predicted Rating

5

Popularity - Ranking

Ranking

Xavier Amatriain – August 2014 – KDD - Learning to rank

● Machine learning problem: goal is to construct

ranking model from training data

● Training data can be a partial order or binary

judgments (relevant/not relevant).

● Resulting order of the items typically induced

from a numerical score

● Learning to rank is a key element for

personalization

● You can treat the problem as a standard

supervised classification problem

Xavier Amatriain – August 2014 – KDD - Learning to rank - Metrics

● Quality of ranking measured using metrics as

○ Normalized Discounted Cumulative Gain

○ Mean Reciprocal Rank (MRR)

○ Fraction of Concordant Pairs (FCP)

○ Others…

● But, it is hard to optimize machine-learned models

directly on these measures (e.g. non-differentiable)

● Recent research on models that directly optimize

ranking measures

Xavier Amatriain – August 2014 – KDD - Learning to rank - Approaches

1. Pointwise

■ Ranking function minimizes loss function defined on

individual relevance judgment

■ Ranking score based on regression or classification

■ Ordinal regression, Logistic regression, SVM, GBDT, …

2. Pairwise

■ Loss function is defined on pair-wise preferences

■ Goal: minimize number of inversions in ranking

■ Ranking problem is then transformed into the binary

classification problem

■ RankSVM, RankBoost, RankNet, FRank…

Xavier Amatriain – August 2014 – KDD - Learning to rank - Approaches

3. Listwise

■ Indirect Loss Function

− RankCosine: similarity between ranking list and ground truth

as loss function

− ListNet: KL-divergence as loss function by defining a

probability distribution

− Problem: optimization of listwise loss function may not optimize

IR metrics

■ Directly optimizing IR metric (difficult since they are

not differentiable)

− Genetic Programming or Simulated Annealing

− LambdaMart weights pairwise errors in RankNet by IR metric

− Gradient descent on smoothed version of objective function (e.

g. CLiMF or TFMAP)

− SVM-MAP relaxes MAP metric by adding to SVM constraints

− AdaRank uses boosting to optimize NDCG

Xavier Amatriain – August 2014 – KDD

1. The Recommender Problem

2. “Traditional” Methods

2.1. Collaborative Filtering

2.2. Content-based Recommendations

2.3. Hybrid Approaches

3. Beyond Traditional Methods

3.1. Learning to Rank

3.2. Similarity

3.3. Deep Learning

3.4. Social Recommendations

3.5. Page Optimization

3.6. Tensor Factorization and Factorization Machines

3.7. MAB Explore/Exploit

4. References

Xavier Amatriain – August 2014 – KDD- 3.2 Similarity as Recommendation

Xavier Amatriain – August 2014 – KDD - Similars

▪ Displayed in many

different contexts

▪ In response to user

actions/context

(search, queue

add…)

▪ More like… rows

Xavier Amatriain – August 2014 – KDD - Graph-based similarities

0.8

0.3

0.3

0.

0.

4

3

0.2

0.7

Xavier Amatriain – August 2014 – KDD - Example of graph-based similarity: SimRank

▪ SimRank (Jeh & Widom, 02): “two objects are

similar if they are referenced by similar

objects.”

Xavier Amatriain – August 2014 – KDD - Similarity ensembles

• Similarity can refer to different dimensions

• Similar in metadata/tags

• Similar in user play behavior

• Similar in user rating behavior

• …

• Combine them using an ensemble

• Weights are learned using regression over existing

response

• Or… some MAB explore/exploit approach

• The final concept of “similarity” responds to what users vote

as similar

Xavier Amatriain – August 2014 – KDD

1. The Recommender Problem

2. “Traditional” Methods

2.1. Collaborative Filtering

2.2. Content-based Recommendations

2.3. Hybrid Approaches

3. Beyond Traditional Methods

3.1. Learning to Rank

3.2. Similarity

3.3. Deep Learning

3.4. Social Recommendations

3.5. Page Optimization

3.6. Tensor Factorization and Factorization Machines

3.7. MAB Explore/Exploit

4. References

Xavier Amatriain – August 2014 – KDD- 3.3 Deep Learning for

Recommendation

Xavier Amatriain – August 2014 – KDD - Deep Learning for Collaborative

Filtering

● Let’s look at how Spotify uses Recurrent Networks

for Playlist Prediction (http://erikbern.com/?p=589)

Xavier Amatriain – August 2014 – KDD - Deep Learning for Collaborative

Filtering

● In order to predict the next track or movie a user is going to

watch, we need to define a distribution

○ If we choose Softmax as it is common practice, we get:

● Problem: denominator (over all examples is very

expensive to compute)

● Solution: build a tree that implements a hierarchical

softmax

● More details on the blogpost

Xavier Amatriain – August 2014 – KDD - Deep Learning for Content-based

Recommendations

● Another application of Deep Learning to recommendations also

from Spotify

○

http://benanne.github.io/2014/08/05/spotify-cnns.html also Deep content-based

music recommendation, Aäron van den Oord, Sander Dieleman and Benjamin

Schrauwen, NIPS 2013

● Application to coldstart new titles when very little CF information

is available

● Using mel-spectrograms from the audio signal as input

● Training the deep neural network to predict 40 latent factors

coming from Spotify’s CF solution

Xavier Amatriain – August 2014 – KDD - Deep Learning for Content-based

Recommendations

● Network architecture made of 4 convolutional layers + 4 fully

connected dense layers

●

One dimensional convolutional layers using RELUs (Rectified Linear Units) with

activation max(0,x)

●

Max-pooling operations between convolutional layers to downsample intermediate

representations in time, and add time invariance

●

Global temporal pooling layer after last convolutional layer: pools across entire time axis,

computing statistics of the learned features across time:: mean, maximum and L2-norm

●

Globally pooled features are fed into a series of fully-connected layers with 2048 RELUs

Xavier Amatriain – August 2014 – KDD - ANN Training over GPUS and AWS

● How did we implement our ANN solution at Netflix?

○

Level 1 distribution: machines over different AWS regions

○

Level 2 distribution: machines in AWS and same AWS region

■

Use coordination tools

●

Spearmint or similar for parameter optimization

●

Condor, StarCluster, Mesos… for distributed cluster coordination

○

Level 3 parallelization: highly optimized parallel CUDA code on GPUs

http://techblog.netflix.com/2014/02/distributed-neural-networks-with-gpus.html

Xavier Amatriain – August 2014 – KDD

1. The Recommender Problem

2. “Traditional” Methods

2.1. Collaborative Filtering

2.2. Content-based Recommendations

2.3. Hybrid Approaches

3. Beyond Traditional Methods

3.1. Learning to Rank

3.2. Similarity

3.3. Deep Learning

3.4. Social Recommendations

3.5. Page Optimization

3.6. Tensor Factorization and Factorization Machines

3.7. MAB Explore/Exploit

4. References

Xavier Amatriain – August 2014 – KDD- 3.4 Social Recommendations

Xavier Amatriain – August 2014 – KDD - Social and Trust-based

recommenders

● A social recommender system recommends items that are

“popular” in the social proximity of the user.

● Social proximity = trust (can also be topic-specific)

● Given two individuals - the source (node A) and sink (node C) -

derive how much the source should trust the sink.

● Algorithms

○

Advogato (Levien)

○

Appleseed (Ziegler and Lausen)

○

MoleTrust (Massa and Avesani)

○

TidalTrust (Golbeck)

Xavier Amatriain – August 2014 – KDD - Other ways to use Social

● Social connections can be used in

combination with other approaches

● In particular, “friendships” can be fed into

collaborative filtering methods in different

ways

− replace or modify user-user “similarity” by using

social network information

− use social connection as a part of the ML objective

function as regularizer

− ...

Xavier Amatriain – August 2014 – KDD - Demographic Methods

● Aim to categorize the user based on personal

attributes and make recommendation based

on demographic classes

● Demographic groups can come from

marketing research – hence experts decided

how to model the users

● Demographic techniques form people-to-

people correlations

Xavier Amatriain – August 2014 – KDD

1. The Recommender Problem

2. “Traditional” Methods

2.1. Collaborative Filtering

2.2. Content-based Recommendations

2.3. Hybrid Approaches

3. Beyond Traditional Methods

3.1. Learning to Rank

3.2. Similarity

3.3. Deep Learning

3.4. Social Recommendations

3.5. Page Optimization

3.6. Tensor Factorization and Factorization Machines

3.7. MAB Explore/Exploit

4. References

Xavier Amatriain – August 2014 – KDD- 3.5. Page Optimization

Xavier Amatriain – August 2014 – KDD - Page Composition

Xavier Amatriain – August 2014 – KDD - Page Composition

1 personalized

page

10,000s

of

10-40

possible

rows

…

rows

per

device

Variable number of

possible videos per

row (up to

thousands)

Xavier Amatriain – August 2014 – KDD - Page Composition

From “Modeling User Attention and

Interaction on the Web” 2014 - PhD Thesis by Dmitry Lagun (Emory U.)

Xavier Amatriain – August 2014 – KDD - User Attention Modeling

More

likely to

see

Less likely

Xavier Amatriain – August 2014 – KDD - User Attention Modeling

From “Modeling User Attention and

Interaction on the Web” 2014 - PhD Thesis by Dmitry Lagun (Emory U.)

Xavier Amatriain – August 2014 – KDD - Page Composition

vs.

Accurate

Diverse

vs.

Discovery

Continuation

vs.

Depth

Coverage

vs.

Freshness

Stability

vs.

Recommendations

Tasks

● To put things together we need to combine different elements

○ Navigational/Attention Model

○ Personalized Relevance Model

○ Diversity Model

Xavier Amatriain – August 2014 – KDD

1. The Recommender Problem

2. “Traditional” Methods

2.1. Collaborative Filtering

2.2. Content-based Recommendations

2.3. Hybrid Approaches

3. Beyond Traditional Methods

3.1. Learning to Rank

3.2. Similarity

3.3. Deep Learning

3.4. Social Recommendations

3.5. Page Optimization

3.6. Tensor Factorization and Factorization Machines

3.7. MAB Explore/Exploit

4. References

Xavier Amatriain – August 2014 – KDD- 3.6 Tensor Factorization &

Factorization Machines

Xavier Amatriain – August 2014 – KDD - N-dimensional model

Xavier Amatriain – August 2014 – KDD - Tensor Factorization

HOSVD: Higher Order Singular

Value Decomposition

Xavier Amatriain – August 2014 – KDD - Tensor Factorization

Where:

● We can use a simple squared error loss function:

● Or the absolute error loss

● The loss function over all users becomes

Xavier Amatriain – August 2014 – KDD - Factorization Machines

• Generalization of regularized matrix (and tensor)

factorization approaches combined with linear (or

logistic) regression

• Problem: Each new adaptation of matrix or tensor

factorization requires deriving new learning

algorithms

• Hard to adapt to new domains and add data sources

• Hard to advance the learning algorithms across approaches

• Hard to incorporate non-categorical variables

Xavier Amatriain – August 2014 – KDD - Factorization Machines

• Approach: Treat input as a real-valued feature vector

• Model both linear and pair-wise interaction of features (i.e.

polynomial regression)

• Traditional machine learning will overfit

• Factor pairwise interactions between features

• Reduced dimensionality of interactions promote generalization

• Different matrix factorizations become different feature

representations

• Tensors: Additional higher-order interactions

• Combines “generality of machine learning/regression

with quality of factorization models”

Xavier Amatriain – August 2014 – KDD - Factorization Machines

• Each feature gets a weight value and a factor vector

• O( ) parameters

• Model equation:

O( 2)

O( )

Xavier Amatriain – August 2014 – KDD - Factorization Machines

▪ Two categorical variables ( , ) encoded as real values:

▪ FM becomes identical to MF with biases:

Xavier Amatriain – August 2014 – KDD - Factorization Machines

▪ Makes it easy to add a time signal

▪ Equivalent equation:

From Rendle (2012) KDD Tutorial

Xavier Amatriain – August 2014 – KDD - Factorization Machines (Rendle, 2010)

• L2 regularized

• Regression: Optimize RMSE

• Classification: Optimize

logistic log-likelihood

• Ranking: Optimize scoresGradient:

• Can be trained using:

• SGD

• Adaptive SGD

• ALS

Least squares SGD:

• MCMC

Xavier Amatriain – August 2014 – KDD - Factorization Machines (Rendle, 2010)

• Learning parameters:

• Number of factors

• Iterations

• Initialization scale

• Regularization (SGD, ALS) – Multiple

• Step size (SGD, A-SGD)

• MCMC removes the need to set those

hyperparameters

Xavier Amatriain – August 2014 – KDD

1. The Recommender Problem

2. “Traditional” Methods

2.1. Collaborative Filtering

2.2. Content-based Recommendations

2.3. Hybrid Approaches

3. Beyond Traditional Methods

3.1. Learning to Rank

3.2. Similarity

3.3. Deep Learning

3.4. Social Recommendations

3.5. Page Optimization

3.6. Tensor Factorization and Factorization Machines

3.7. MAB Explore/Exploit

4. References

Xavier Amatriain – August 2014 – KDD- 3.7 MAB Explore/Exploit

Xavier Amatriain – August 2014 – KDD - Explore/Exploit

• One of the key issues when building any kind of

personalization algorithm is how to trade off:

• Exploitation: Cashing in on what we know about

the user right now

• Exploration: Using the interaction as an

opportunity to learn more about the user

• We need to have informed and optimal strategies to

drive that tradeoff

• Solution: pick a reasonable set of candidates and

show users only “enough” to gather information

on them

Xavier Amatriain – August 2014 – KDD - Multi-armed Bandits

• Given possible strategies/candidates (slot machines) pick the

arm that has the maximum potential of being good (minimize

regret)

• Naive strategy:

• Explore with a small probability (e.g. 5%) -> choose an

arm at random

• Exploit with a high probability (1- ) (e.g. 95%) -> choose

the best-known arm so far

• Translation to recommender systems

• Choose an arm = choose an item/choose an algorithm

(MAB testing)

Xavier Amatriain – August 2014 – KDD - Multi-armed Bandits

• Better strategies not only take into account the mean of the

posterior, but also the variance

• Upper Confidence Bound (UCB)

• Show item with maximum score

• Score = Posterior mean +

• Where can be computed differently depending on the

variant of UCB (e.g. to the number of trials or the

measured variance)

• Thompson Sampling

• Given a posterior distribution

• Sample on each iteration and choose the action that

maximizes the expected reward

Xavier Amatriain – August 2014 – KDD - Multi-armed Bandits

Xavier Amatriain – August 2014 – KDD

1. The Recommender Problem

2. “Traditional” Methods

2.1. Collaborative Filtering

2.2. Content-based Recommendations

2.3. Hybrid Approaches

3. Beyond Traditional Methods

3.1. Learning to Rank

3.2. Similarity

3.3. Deep Learning

3.4. Social Recommendations

3.5. Page Optimization

3.6. Tensor Factorization and Factorization Machines

3.7. MAB Explore/Exploit

4. References

Xavier Amatriain – August 2014 – KDD- 4. References

Xavier Amatriain – August 2014 – KDD - References

● "Recommender Systems Handbook." Ricci, Francesco, Lior Rokach,

Bracha Shapira, and Paul B. Kantor. (2010).

● “Recommender systems: an introduction”. Jannach, Dietmar, et al.

Cambridge University Press, 2010.

● “Toward the Next Generation of Recommender Systems: A Survey of the

State-of-the-Art and Possible Extensions”. G. Adomavicious and A.

Tuzhilin. 2005. IEEE Transactions on Knowledge and Data Engineering,

17 (6)

● “Item-based Collaborative Filtering Recommendation Algorithms”, B.

Sarwar et al. 2001. Proceedings of World Wide Web Conference.

● “Lessons from the Netflix Prize Challenge.”. R. M. Bell and Y. Koren.

SIGKDD Explor. Newsl., 9(2):75–79, December 2007.

● “Beyond algorithms: An HCI perspective on recommender systems”. K.

Swearingen and R. Sinha. In ACM SIGIR 2001 Workshop on

Recommender Systems

● “Recommender Systems in E-Commerce”. J. Ben Schafer et al. ACM

Conference on Electronic Commerce. 1999-

● “Introduction to Data Mining”, P. Tan et al. Addison Wesley. 2005

Xavier Amatriain – August 2014 – KDD - References

● “Evaluating collaborative filtering recommender systems”. J. L. Herlocker,

J. A. Konstan, L. G. Terveen, and J. T. Riedl. ACM Trans. Inf. Syst., 22(1):

5–53, 2004.

● “Trust in recommender systems”. J. O’Donovan and B. Smyth. In Proc. of

IUI ’05, 2005.

● “Content-based recommendation systems”. M. Pazzani and D. Billsus. In

The Adaptive Web, volume 4321. 2007.

● “Fast context-aware recommendations with factorization machines”. S.

Rendle, Z. Gantner, C. Freudenthaler, and L. Schmidt-Thieme. In Proc. of

the 34th ACM SIGIR, 2011.

● “Restricted Boltzmann machines for collaborative filtering”. R.

Salakhutdinov, A. Mnih, and G. E. Hinton.In Proc of ICML ’07, 2007

● “Learning to rank: From pairwise approach to listwise approach”. Z. Cao

and T. Liu. In In Proceedings of the 24th ICML, 2007.

● “Introduction to Data Mining”, P. Tan et al. Addison Wesley. 2005

Xavier Amatriain – August 2014 – KDD - References

● D. H. Stern, R. Herbrich, and T. Graepel. “Matchbox: large scale online

bayesian recommendations”. In Proc.of the 18th WWW, 2009.

● Koren Y and J. Sill. “OrdRec: an ordinal model for predicting personalized

item rating distributions”. In Rec-Sys ’11.

● Y. Koren. “Factorization meets the neighborhood: a multifaceted

collaborative filtering model”. In Proceedings of the 14th ACM SIGKDD,

2008.

● Yifan Hu, Y. Koren, and C. Volinsky. “Collaborative Filtering for Implicit

Feedback Datasets”. In Proc. Of the 2008 Eighth ICDM

● Y. Shi, A. Karatzoglou, L. Baltrunas, M. Larson, N. Oliver, and A. Hanjalic.

“CLiMF: learning to maximize reciprocal rank with collaborative less-is-

more filtering”. In Proc. of the sixth Recsys, 2012.

● Y. Shi, A. Karatzoglou, L. Baltrunas, M. Larson,A. Hanjalic, and N. Oliver.

“TFMAP: optimizing MAP for top-n context-aware recommendation”. In

Proc. Of the 35th SIGIR, 2012.

● C. Burges. 2010. From RankNet to LambdaRank to LambdaMART: An

Overview. MSFT Technical Report

Xavier Amatriain – August 2014 – KDD - References

● A. Karatzoglou, X. Amatriain, L. Baltrunas, and N. Oliver. “Multiverse

recommendation: n-dimensional tensor factorization for context-aware

collaborative filtering”. In Proc. of the fourth ACM Recsys, 2010.

● S. Rendle, Z. Gantner, C. Freudenthaler, and L. Schmidt-Thieme. “Fast

context-aware recommendations with factorization machines”. In Proc. of

the 34th ACM SIGIR, 2011.

● S.H. Yang, B. Long, A.J. Smola, H. Zha, and Z. Zheng. “Collaborative

competitive filtering: learning recommender using context of user choice. In

Proc. of the 34th ACM SIGIR, 2011.

● N. N. Liu, X. Meng, C. Liu, and Q. Yang. “Wisdom of the better few: cold

start recommendation via representative based rating elicitation”. In Proc.

of RecSys’11, 2011.

● M. Jamali and M. Ester. “Trustwalker: a random walk model for combining

trust-based and item-based recommendation”. In Proc. of KDD ’09, 2009.

Xavier Amatriain – August 2014 – KDD - References

● J. Noel, S. Sanner, K. Tran, P. Christen, L. Xie, E. V. Bonilla, E.

Abbasnejad, and N. Della Penna. “New objective functions for social

collaborative filtering”. In Proc. of WWW ’12, pages 859–868, 2012.

● X. Yang, H. Steck, Y. Guo, and Y. Liu. “On top-k recommendation using

social networks”. In Proc. of RecSys’12, 2012.

● Dmitry Lagun. “Modeling User Attention and Interaction on the Web” 2014

PhD Thesis (Emory Un.)

● “Robust Models of Mouse Movement on Dynamic Web Search Results

Pages - F. Diaz et al. In Proc. of CIKM 2013

● Amr Ahmed et al. “Fair and balanced”. In Proc. WSDM 2013

● Deepak Agarwal. 2013. Recommending Items to Users: An Explore Exploit

Perspective. CIKM ‘13

Xavier Amatriain – August 2014 – KDD - Online resources

● Recsys Wiki: http://recsyswiki.com/

● Recsys conference Webpage: http://recsys.acm.org/

● Recommender Systems Books Webpage: http://www.

recommenderbook.net/

● Mahout Project: http://mahout.apache.org/

● MyMediaLite Project: http://www.mymedialite.net/

Xavier Amatriain – August 2014 – KDD