このページは http://www.slideshare.net/akyrola/largescale-recommendation-systems-on-just-a-pc の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

約3年前 (2013/10/13)にアップロードinテクノロジー

My keynote at the large-scale recommender systems workshop at Recsys 2013.

- Large-scale Recommender

Systems on Just a PC

LSRS 2013 keynote

(RecSys ’13 Hong Kong)

Aapo Kyrölä

Ph.D. candidate @ CMU

http://www.cs.cmu.edu/~akyrola

Twitter: @kyrpov

Big Data – smal machine - My Background

• Academic: 5th year Ph.D. @ Carnegie Mellon.

Advisors: Guy Blelloch, Carlos Guestrin (UW)

2009

2012

+ Shotgun : Parallel L1-regularized regression solver (ICML 2011).

+ Internships at MSR Asia (2011) and Twitter (2012)

• Startup Entrepreneur

Habbo : founded 2000 - Outline of this talk

1. Why single-computer computing?

2. Introduction to graph computation and

GraphChi

3. Recommender systems with GraphChi

4. Future directions & Conclusion - Large-Scale Recommender Systems on

Just a PC

Why on a single machine?

Can

Ca ’t’ t we

w just

s tuse

s th

t e

Clou

Cl d

ou ? - Why use a cluster?

Two reasons:

1. One computer cannot handle my problem in a

reasonable time.

2. I need to solve the problem very fast. - Why use a cluster?

Two reasons:

1. One computer cannot handle my problem in a

reasonable time.

Our work expands the space of feasible (graph) problems on

one machine:

- Our experiments use the same graphs, or bigger, than previous

papers on distributed graph computation. (+ we can do Twitter

graph on a laptop)

- Most data not that “big”.

2. I need to solve the problem very fast.

Our work raises the bar on required performance for a

“complicated” system. - Benefits of single machine

systems

Assuming it can handle your big

problems…

1. Programmer productivity

– Global state

– Can use “real data” for development

2. Inexpensive to install, administer, less

power.

3. Scalability. - Efficient Scaling

Distributed Graph

Single-computer

System

system (capable of big tasks)

Ta

T sk

s

k 1

Ta

T sk

s

k 2

Ta

T sk

s k7

Ta

T sk

s k6

Ta

T sk

s k5

Ta

T s

a k

s k4

Ta

T sk

s k3

Ta

T s

a k k2

Ta

T sk

s k1

Ta

T sk

s

k 3

Ta

T sk

s

k 4

Ta

T sk

s

k 5

Ta

T sk

s

k 6

6 machines

Ta

T s

a k

k 1

(Significantly) less

Exactly Ta

T s

2x k

s

k 2

than 2x throughput

Ta

T s

a k

k 3

throughput with 2x

Ta

T sk

s

k 4

with 2x machines

machin Tea

Ts sk

s

k 5

T1

T 1

11

T1

T 0

10

T9

T9

T8

T8

T7

T7

T6

T6

T5

T5

T4

T4

T3

T3

T2

T2

T1

T1

Ta

T sk

s

k 6

Ta

T sk

s

k 10

1

Ta

T sk

s

k 11

1

Ta

T sk

s

k 12

1

12 machines

Time

T

Time

T - GRAPH COMPUTATION AND

GRAPHCHI - Why graphs for recommender

systems?

• Graph = matrix: edge(u,v) = M[u,v]

– Note: always sparse graphs

• Intuitive, human-understandable

representation

– Easy to visualize and explain.

• Unifies collaborative filtering (typically matrix

based) with recommendation in social

networks.

– Random walk algorithms.

• Local view vertex-centric computation - Vertex-Centric Computational

Model

• Graph G = (V, E)

– directed edges: e = (source,

A

B

destination)

– each edge and vertex

associated with a value

(user-defined type)

– vertex and edge values can

be modified

Dat

a a

t

Dat

a a

t

• (structure modification also

Da

D t

a a

t

Da

D t

a a

t

Da

D t

a a

t

supported)

Dat

a a

t

Da

D t

a a

t

Da

D t

a a

t

Dat

a a

t

Da

D t

a a

t

GraphChi – Aapo Kyrola

12 - Vertex-centric Programming

• “Think like a vertex”

• Popularized by the Pregel and GraphLab

projects

Da

D t

a a

t

Dat

a a

t

Da

D t

a a

t

Da

D t

a a

t

Dat

a a

t

MyFunc(vertex)

Da

D t

a a

t

{ // modify neighborhood }

Da

D t

a a

t

Da

D t

a a

t

Da

D t

a a

t

Da

D t

a a

t - What is GraphChi

2

Both

ot in OS

O DI’I12! - The Main Challenge of Disk-

based Graph Computation:

Random Access

<< 5-10 M random

edges / sec to achieve

“reasonable

performance”

100s reads/writes per sec

~ 100K reads / sec (commodity)

~ 1M reads / sec (high-end arrays) - Details: Kyrola, Blel och, Guestrin: “Large-scale graph computation on just a PC” (OSDI 2012)

Parallel Sliding Windows

or

Only P large reads for each interval (sub-graph).

P2 reads on one full pass. - GraphChi Program Execution

For T iterations:

For p=1 to P

For v in interval(p)

updateFunction(v)

For T iterations:

For v=1 to V

updateFunction(v)

“Asynchronous”: updates immediately

visible (vs. bulk-synchronous). - Performance

GraphC

Gr

hi

aphC c

hi an

c c

an om

c

put

om

e

put on

e

t

on he

ful

f l T

l witt

T

e

witt r f

r oll

f ow

o -

w gra

gr ph with

wi

just

jus a s

a t

s andar

t

d

andar la

d p

la t

p op.

t

~ as fast as a very large Hadoop cluster!

(size of the graph Fal 2013, > 20B edges [Gupta et al 2013]) - GraphChi is Open Source

• C++ and Java-versions in GitHub:

http://github.com/graphchi

– Java-version has a Hadoop/Pig wrapper.

• If you really really want to use Hadoop. - RECSYS MODEL TRAINING

WITH GRAPHCHI - Overview of Recommender Systems

for GraphChi

• Collaborative Filtering toolkit (next

slide)

• Link prediction in large networks

– Random-walk based approaches (Twitter)

– Talk on Wednesday. - GraphChi’s Collaborative Filtering

Toolkit

• Developed by Danny Bickson

(CMU / GraphLab Inc)

• Includes:

– Alternative Least Squares (ALS)

– Sparse-ALS

See Danny’s blog for more

– SVD++

information:

– LibFM (factorization machines)

http://bickson.blogspot.com

/

– GenSGD

2012/12/collaborative-filteri

– Item-similarity based methods

ng-with-

–

graphchi.html

PMF

Note: In the C++ -version.

– CliMF (contributed by Mark Levy)

– ….

Java-version in development

by a CMU team. - TWO EXAMPLES: ALS AND

ITEM-BASED CF - Example: Alternative Least Squares

Matrix Factorization (ALS)

• Task: Predict ratings for items (movies)

by users.

• Model:

– Latent factor model (see next slide)

Reference: Y. Zhou, D. Wilkinson, R. Schreiber, R. Pan: “Large-Scale

Paral el Collaborative Filtering for the Netflix Prize” (2008) - ALS: Product – Item bipartite graph

0.4

2.3

-1.8

2.9

1.2

Women on the Verge of a

4

Nervous Breakdown

2.3

2.5

3.9

0.02

0.04

3

The Celebration

8.7

2.9

0.04

2.1

3.141

City of God

-3.2

2.8

0.9

0.2

4.1

2

Wild Strawberries

5

Us

U er’s ra

r ti

a ng of

o a mov

o ie mod

o eled as a dot

d -p

ot rod

r uct

u :

ct

<fact

a or(

o use

u

La

r Do

), l

ce Vi

fa t

c a

tor(m

r ovi

o e)>

e - ALS: GraphChi implementation

• Update function handles one vertex a time (user

or movie)

• For each user:

– Estimate latent(user): minimize least squares of

dot-product predicted ratings

• GraphChi executes the update function for each

vertex (in parallel), and loads edges (ratings) from

disk

– Latent factors in memory: need O(V) memory.

– If factors don’t fit in memory, can replicate to edges.

and thus store on disk

Scales to very large problems! - ALS: Performance

Matrix Factorization (Alternative Least Squares)

Netflix (99M edges), D=20

Remark: Netflix is not a big problem, but

GraphChi wil scale at most linearly with

input size (ALS is CPU bounded, so should

be sub-linear in #ratings). - Example: Item Based-CF

• Task: compute a similarity score [e,g.

Jaccard] for each movie-pair that has at least

one viewer in common.

– Similarity(X, Y) ~ # common viewers

– Output top K similar items for each item to a file.

– … or: create edge between X, Y containing the

similarity.

• Problem: enumerating all pairs takes too

much time. - Women on the Verge of a

Nervous Breakdown

3

Solution: Enumerate all

The Celebration

triangles of the graph.

New problem: how to

City of God

enumerate triangles if the

graph does not fit Wil

in d S

Rtraw

A berrie

M?s

La Dolce Vita - Enumerating Triangles (Item-CF)

• Triangles with edge (u, v) =

intersection(neighbors(u),

neighbors(v))

• Iterative memory efficient solution (next

slide) - Algorithm:

• Let pivots be a subset of the vertices;

• Load al neighbor-lists (adjacency lists)

of pivots into RAM

• Use now GraphChi to load al vertices

from disk, one by one, and compare

their adjacency lists to the pivots’

adjacency lists (similar to merge).

• Repeat with a new subset of pivots.

PIVOTS - Triangle Counting Performance

Triangle Counting

twitter-2010 (1.5B edges) - FUTURE DIRECTIONS &

FINAL REMARKS - Single-Machine Computing in

Production?

• GraphChi supports incremental

computation with dynamic graphs:

– Can keep on running indefinitely, adding new

edges to the graph Constantly fresh model.

– However, requires engineering – not included

in the toolkit.

• Compare to a cluster-based system (such

as Hadoop) that needs to compute from

scratch. - Unified Recsys Platform for

GraphChi?

• Working with masters students at CMU.

• Goal: ability to easily compare different

algorithms, parameters

– Unified input, output.

– General programmable API (not just file-based)

– Evaluation process: Several evaluation metrics; Cross-

validation, held-out data…

– Run many algorithm instances in parallel, on same

graph.

– Java.

• Scalable from the get-go. - DataDescriptor

data deﬁnition

column1 : categorical

Input data

column2: real

column3: key

column4: categorical

Algorithm X: Input

Algorithm Input Descriptor

map(input: DataDescriptor)

GraphChi

Preprocessor

aux

data

GraphChi Input - aux

data

Disk

GraphChi Input

Algorithm X Training

Algorithm Y Training

Algorithm Z Training

Program

Program

Program

Held-out

data (test

Algorithm X Predictor

data)

training

metrics

test quality

metrics - Recent developments: Disk-based

Graph Computation

• Recently two disk-based graph computation

systems published:

– TurboGraph (KDD’13)

– X-Stream (SOSP’13 in October)

• Significantly better performance than GraphChi

on many problems

– Avoid preprocessing (“sharding”)

– But GraphChi can do some computation that X-

Stream cannot (triangle counting and related);

TurboGraph requires SSD

– Hot research area! - Do you need GraphChi – or any

system?

• Heck, for many algorithms, you can just

mmap() over your (binary) adjacency

list / sparse matrix, and write a for-loop.

– See Lin, Chau, Kang Leveraging

Memory Mapping for Fast and Scalable Graph Computation o

n a

PC (Big Data ’13)

• Obviously good to have a common API

– And some algos need more advanced

solutions (like GraphChi, X-Stream,

TurboGraph)

Beware of the hype! - Conclusion

• Very large recommender algorithms can now be

run on just your PC or laptop.

– Additional performance from multi-core parallelism.

– Great for productivity – scale by replicating.

• In general, good single machine scalability requires

care with data structures, memory management

natural with C/C++, with Java (etc.) need low-level

byte massaging.

– Frameworks like GraphChi hide the low-level.

• More work needed to ‘’productize’’ current work. - Thank you!

Aapo Kyrölä

Ph.D. candidate @ CMU – soon to

graduate! (Currently visiting U.W)

http://www.cs.cmu.edu/~akyrola

Twitter: @kyrpov