- Fast Top-k Simple Shortest

Paths Discovery in Graphs

Jun Gao, Huida Qiu, Xiao Jiang, Dongqing

Yang, Tenjiao Wang

Database Research Group

Department of Computer Science

Peking University

1 - Outline

Motivation

Related Work

Our Method

Experiments

Conclusion

2 - Motivation

From “Finding the k Shortest Paths” by David

Eppstein

• Additional constraints

• Model evaluation

• Sensitivity analysis

• Generation of alternatives

When the shortest path is not sufficient for

application, top-k shortest paths are desired.

3 - Top k shortest paths query

Top 2 general shortest path

1

(al owing loops) :

1

1st: 1 2 3 4, length: 3

2

3

1

2nd: 1 2 3 6 2 3 4, length: 6

1

5

1

6

• Top 2 simple shortest path

3

1

(without loops) :

3

1st: 1 2 3 4, length: 3

1

2nd: 1 2 5 3 4, length: 8

4 - Outline

Motivation

Related work

Our method

Experiments

Conclusion

5 - Top K genenal shortest path problem

Related work

• David Eppstein. Finding the k shortest paths. SIAM

J.Comput. (SIAMCOMP), 28(2):652–673, 1998

Basic Idea

Original Graph

Shortest Path Tree

Side Cost on Edges

Time Complexity

• O(m+nlogn+k)

6 - Top K loopless Shortest Path Problem

Related work

• J. Y. YEN. Finding the k shortest

loopless paths in a network. Manage.

Sci, 17(712-716), 1971.

Basic Idea

s

s

s

s

s

s

• Find the shortest path first

b

b

b

b

b

b

• 2-th shortest path should be

e

e

e

e

e

e

- different from the shortest path

- loopless

g

g

g

g

g

g

- shortest in the remaining paths

f

f

f

f

f

f

• Find the next shortest paths iteratively

Time Complexity

t

t

t

t

t

t

• O(kn(m+nlogn))

Candidate Paths

7 - Top K loopless Shortest Path Problem

Related work

• J. Hershberger, S. Suri, and A. Bhosle.

On the difficulty of some shortest path s

s

s

s

s

s

problems. ACM Transactions on

Algorithms, 3(1), 2007

b

b

b

b

b

b

Basic Idea

e

e

e

e

e

e

• Remove edge to find next shortest

g

g

g

g

g

g

path

• Use the intermediate result to lower

f

f

f

f

f

f

the cost

t

t

t

t

t

t

Time Complexity

• O(k(m+nlogn))

• Loop in some cases

Candidate Paths

8 - Outline

Motivation

Related work

Our method

Experiments

Conclusion

9 - Basic Idea

The key operation is to reduce the redundant

computation cost for the same target node.

We pre-compute the shortest path tree rooted at the

target node

We expect the candidate path searching can be

terminated early with the shortest path tree

The final path is the concatenation of 3 sub-paths, the

first sub-path is in the current shortest path, the

second one is discovered online, the third on is in the

shortest path tree.

• The existing method need discover the second and third sub-

path online.

10 - Graph Pre-processing

Precompute the shortest path tree rooted at t

Make side cost of each edge

Assign (pre, post, parent) encoding on each

node to accelarate the loop detection

11 - Searching for the candidate paths

On the transformed graph,

we start searching with the

Starting Node

side cost

The path with the minimal

side cost equals the path with

u

the cost in the original graph

In the seaching, the loop

d2

needs be detected.

l

When no loop can be found,

the path can be discovered

s

i

d1

t

directly

12 - Path Searching Example

The edge e to g cannot be considered

d is then considered. But e is the ancestor of d

c is then considered, but e is the ancestor of c

f is then considered, f to t is not via node s b e

13 - Optimization-1

k-reduction strategy: stop when

• k1 shortest paths discovered;

• k2 paths in candidate pool have the same length as the k1-th

shortest path;

• k1 + k2 ≥ k - Optimization-2

Suppose we know the length of the shortest path is l1,

the length of the k-th shortest path is l2;

Let th = l2 – l1;

When looking for paths from deviation nodes, we can

stop searching when the current accumulated side

cost already exceeds th; - Optimization-2

Approximate threshold: the shortest path is needed; any

other k-1 paths wil do.

• Eager policy: search for k-1 candidate paths instead of one from

the first deviation node;

• Lazy policy: determine after there are k-1 paths in the candidate

pool.

• As more paths are discovered, the threshold can be adaptively

updated and slowly becomes tighter. - Outline

Motivation

Related work

Our method

Experiments

Conclusion

17 - Experimental Evaluation

Comparison algorithms:

• YEN: Yen’s classic algorithm;

• JH: the edge-replacement based method by John Hershberger et

al.

• Implementation: C++, by Hershberger et al.

Our method: al implemented in Java

• KR: the base method with k-reduction;

• KRE: k-reduction plus Eager policy;

• KRL: k-reduction plus Lazy policy; - Datasets

• Real datasets: (Density = # of Edges / # of

nodes)

Dataset

# of Nodes

# of Edges

Density

Add32

4,960

9,462

1.91

Crack

10,240

30,380

2.97

Gupta3

16,783

4,670,105

278.26

FLA

1,070,376

2,712,798

2.53

• Synthetic datasets:

• Random graphs generated by Barabasi Graph

Generator (by Derek Dreier, available from

Internet) - Impact of Graph Size

Density = 3, # of nodes from 10k to 100k; - Impact of k

Performed on real graphs. - Impact of Density

k=1000 - Outline

Motivation

Related Work

Our Method

Experiments

Conclusion

23 - Conclusion

We speed up top-k shortest path discovery.

• Combine Yen’s and Eppstein’s idea

• Transform the candidate path discovery to the side cost graph

- Terminate Earlier

• Use structural labels to detect the loop effectively.

• Introduce two other optimizations

- Reduce number of k

- Avoid the worst case of path searching. - Future work

Extend the top k shortest path between two node to

two node sets

Find top-k shortest path core

Find approximate top-k shortest path

25 - Thanks for your attention!

gaojun@pku.edu.cn

26