このページは http://www.slideshare.net/doryokujin/the-definition-of-graphdb の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

- About Me

・Takahiro Inoue(age 26)

・twitter: doryokujin

・Majored in Math (Statistics & Graph Algorithm)

・Data Scientist

・Leader of MongoDB JP

・Interest: DataProcessing, GraphDB - Agenda

(1) Graph Type for GraphDB

∼Which Graph is Better for GraphDB ?∼

(2) Graph Traversals

∼ Graph Query Graph Traversal ∼

(3) Index Free Adjacency

∼The Key of Deﬁnition of GraphDB∼

(4) Other Topics - (1) Graph Class for GraphDB

∼Which Graph is Better for GraphDB ?∼ - Deﬁnition of Graph

・Graph is an ordered pair G = (V, E)

・Set V of Nodes

・Set E of Edges

- 2 Element Subsets of V

- Representing Relationship Between Nodes

- Directed or Undirected - Def. Undirected Graph

[Undirected Graph]

・Edges have no orientation

・Not ordered pairs, but sets

{u, v} i.e. Edge (a, b) (b, a)

・All nodes have the same

object type

・All edges have the same

G = (V, E)

relationship - Def. Directed Graph

[Directed Graph (Digraph)]

・Ordered pair D = (V, A)

・A: Set of ordered pairs of

symmetric

nodes, called arrows

・All nodes have the same

object type

・All edges have the same

D = (V, A)

relationship - Example: (Un)Directed Graph

[Facebook]

[Twitter]

follow

friend

friend

follow

follow

friend

follow

・relationship of all edges: friend

・relationship of all edges: follow

・facebook friend is symmetric

・twitter follow action is asymnetric

・node object type: user

・node object type: user - Def. Mixed Graph, Multigraph

[Mixed Graph]

・Edges may be directed and

some may be undirected

G = (V, E,A)

[Multigraph]

multiple

・Including (direct/indirect) loop

edges and multiple edges

loop

D = (V, A) - Common Representation

・These types of graphs can have common

representation

・undirected edge --> 2 directed edges

・symmetric edge --> 2 asymmetric directed edges

・allows loop and multiple edge - Common Representation

symmetric

undirected

multiple

loop

・No undirected edge

・No symmetric edge - Def. SIngle-Relational Graph

[Single-Relational Structures]

・Multigraph

・All edges must be the same relationship

・All nodes must be the same object type

・All graphs already introduced are SR-Graphs

Is this class suﬃcient for

graph database ? - Def. Multi-Relational Graph

[Multi-Relational Structures]

・More ﬂexible than single-relational structures

・All edges are directed and asymmetric

・Each edge can have a diﬀerent relationship

・Each node can have a diﬀerent type object - Example: Multi-Relational Graph

[Twitter]

Reply

Reply

Block

Reply

DM

RT

RT

Reply

DM

・4 types of relationships:

Reply , DM , RT , Block

・Every node still have the

same object type - Example: Multi-Relational Graph

[Livlis]

follow

http://www.livlis.com/

invite

like!

want!

bought!

exhibit

want!

want!

want!

follow

message

follow

exhibit

exhibit

・Many types of relationship

want!

・Connection: user --> item

・Connection: user <--> user - Def. Property Graph

[Property Graph]

・Multi-Relational Graph

・Each node and edge has some properties

・Each property is represented by key-value

and scheme-free

id

id_A

id

id_B

follow

follow

100

follow

500

follower

200

since

2011/01/23

follower

1000

since

2011/01/01

since

2011/06/01 - Example: Property Graph

...

...

...

...

[Livlis]

...

...

...

...

...

...

follow

...

...

invite

name

A

like!

follow

100

...

...

follower

200

...

...

sex

man

want!

...

...

bought!

exhibit

want!

favorite

50

since

01/01/01

...

...

price

$50

want!

...

...

access

500

...

...

wated

10

liked

30

...

...

want! ...

...

follow

...

...

message

since

01/01/01

follow

exhibit

price

$50

exhibit

name

B

follow

10

want!

...

...

...

...

follower

20

...

...

sex

man - Def. Hyper Graph

[Hyper Graph]

・Set V of Nodes

・Set E of non-empty subsets of V

・i.e. Edge can point to more than

two nodes

・Every node or edge carry an

arbitrary value as payload

H = (V, E)

・Property Graph Hyper Graph

Sones: manage edge types with

GraphDB 2.1 - Summary

・Property Graph have ﬂexible representation

・Key features:

- All edges are directed and asymmetry

- Each edge can have a diﬀerent relationship

- Each node can have a diﬀerent type object

- All elements have property with key-value style

・Many GraphDBs support for Property

Graph Models ※ Some GraphDBs support for Hyper Graph Model - (2) Graph Traversals

∼ Graph Query Graph Traversal ∼ - Graph Traversals

Property Graph Algorithms

Graph Query Graph Traversal

・Not an global search like other RDBMS or NoSQL

・But traverse over the graph from root node

・ Locality is very important - Graph Traversals

・To traverse a graph is to process every node in the

graph exactly once

・The two most common traversal patterns are

breadth-ﬁrst traversal and depth-ﬁrst traversal

・For each step, the traverser moves to it's adjacent

vertices

・Repeat each step until speciﬁc times or full some

condition - Graph Traversals

・Single step traversal: from element i to element j,

where i, j (V E).

・Can deﬁne graph traversals of arbitrary length from

single step traversal

The Graph Traversal Pattern

9

・Querying is performed through traversals, which can

name=...

perform millions of "joins" per second

2

friend

name=Alberto Pepe

name=...

1

friend

3

name=...

friend

eout

4

efriend

lab+

vin

✏name

The Graph Traversal Pattern

Fig. 3. A single path along along the f traversal.

those edges with the label friend, then traverse to the incoming (i.e. head)

vertices on those friend-labeled edges. Finally, of those vertices, return their

name property.21 A single legal path according to this function is diagrammed

in Figure 3. Though not diagrammed for the sake of clarity, the traversal would

also go from vertex 1 to the name of vertex 2 and vertex 3. The function f

is a “higher-order” adjacency defined as the composition of explicit adjacen-

cies and serves as a join of Alberto and his friend’s names.22 The remainder

of this section demonstrates graph traversals in real-world problems-solving

situations.

3.1 Traversing for Recommendation

Recommendation systems are designed to help people deal with the problem

of information overload by filtering information in the system that doesn’t

pertain to the person [14]. In a positive sense, recommendation systems focus

a person’s attention on those resources that are likely to be most relevant

to their particular situation. There is a standard dichotomy in recommenda-

tion research—that of content- vs. collaborative filtering-based recommenda-

tion. The prior deals with recommending resources that share characteristics

(i.e. content) with a set of resources. The latter is concerned with determining

the similarity of resources based upon the similarity of the taste of the peo-

ple modeled within the system [6]. These two seemingly di↵erent techniques

to recommendation are conveniently solved using a graph database and two

simple traversal techniques [10, 5]. Figure 4 presents a toy graph data set,

where there exist a set of people, resources, and features related to each other

by likes- and feature-labeled edges. This simple data set is used for the

remaining examples of this subsection.

21 Note that the order of a composition is evaluated from right to left.

22 This is known as a virtual edge in the graph system called DEX [9]. - Graph Traversals

Basic Graph Traversals - Summary

・GraphDB is eﬃcient with respects to local data

analysis (Recommendation, Social Analytics, Shortest

Path). They all focus on a user

・Locality is deﬁned by direct referent structures

・Frame all solutions to problems as a traversal over

local regions of the graph - (3) Index Free Adjacency

∼The Key of Deﬁnition of GraphDB∼ - The Deﬁnition of GraphDB

※ GraphDB is a not only database that can model a

graph structures (RDB, Document, etc...)

[deﬁnition]

・A graph database is any

storage system that provides

index-free adjacency

The Graph Traversal Programming Pattern - The Deﬁnition of GraphDB

[Important feature]

・Mini Index: Every element (node or edge) has a direct

pointer to its adjacent element

・No Index lookup: we can determine which vertex is

adjacent to which other vertex without looking up an

index-tree - Relational Data Model

[Index-tree]

Graph Databases and Endogenous Indices

name property index

Graph Databases Make

Indexing of V Use

erticies of Indices

views property index

gender property index

} Index of Vertices

(by id)

name=neo4j

views=56781

page_rank=0.023

cites

cites

[Graph data in table]

name=tenderlove

gender=male

created

created

created

date=2007/10

A

B

C

cites

follows

column1

column2follows column3

created

1

name=peterneubauer

follows

name=graph_blog

follows

} The Graph

views=1000

follows

2

created

D

E

3

name=ahzf

name=twarko

age=30

4

5

6

Graph Data

7

8

• There is more to the graph than the explicit graph structure.

The Graph Traversal Programming Pattern

• Indices index the vertices, by their properties (e.g. ids). - Relational Data Model

2. Looking up

the index-tree

B

E

1. Want to determine

neighbors of A

log_2(n)

time cost

4. Moving to

A

either B or C

A

B

C

B, C

E

D, E

C

D

D

E

3. Getting the

adjacency list (B,C)

[Index-tree]

[Graph Data] - Relational Data Model

Takes many time

for traversing

Looking cost

become very high

[Index-tree]

[Graph Data]

Lookup cost become larger Graph growth. O(log2n) - Cost of Looking Up Index-tree

・Insert time: as the graph grows in size, the cost of a

insert time become high

・lookup time: as the graph grows in size, the cost of a

lookup time growth in proportional to n, O(log2n)

・memory size: as the graph grows in size, the memory

size become high - Graph DB Model

[Mini-Index] direct

[Constant time]: It is

references to its adjacent

dependent upon the

vertices

number of connected

B

E

edges

D,E

G

A B, C

D

G

E,F,G

C

F

F,F

G

[Graph Data] - Mini-Index: Graph DB Model

The cost of a local

step remains the same

[Graph Data] - Indexing their properties

・Making external indexing system to index the

properties of its vertices and edges

Graph Databases and Endogenous Indices

name property index

views property index

gender property index

name=neo4j

views=56781

page_rank=0.023

cites

cites

name=tenderlove

gender=male

created

created

created

date=2007/10

cites

follows

follows

created

name=peterneubauer

follows

name=graph_blog

follows

views=1000

follows

created

name=ahzf

name=twarko

age=30

The Graph Traversal Programming Pattern - Summary

・GraphDB provides index-free adjacency

・No looking up index-tree, each element has direct

pointers

・They have a external index system for their

properties (both nodes and relations)

・A very large graph can storage only single server

because a traversal cost is independence of growth of

graph - (4) Other Beneﬁts of GraphDB
- A Graph Database Transforms a Key-

Value Store

← RDBMS

↓ GraphDB as Key-Value

Comparing Database Models - A Graph Database transforms a

Document DB

↑ Document DB

↓ GraphDB as RDBMS

Comparing Database Models - Example of e-commerce site

Square Pegs and Round Holes in the NOSQL World - Example of e-commerce site

Square Pegs and Round Holes in the NOSQL World - Did you Understand?

(1) Graph Type for GraphDB

∼Which Graph is Better for GraphDB ?∼

(2) Graph Traversals

∼ Graph Query Graph Traversal ∼

(3) Index Free Adjacency

∼The Key of Deﬁnition of GraphDB∼