Fast Top-k Simple Shortest Paths Discovery in Graphs Jun Gao, Huida Qiu, Xiao Jiang, Dongqing Yang, Tenjiao Wang Database Research Group Department of Computer Science Peking University 1
Outline Motivation Related Work Our Method Experiments Conclusion 2
Motivation From “Finding the k Shortest Paths” by David Eppstein • Additional constraints • Model evaluation • Sensitivity analysis • Generation of alternatives When the shortest path is not sufficient for application, top-k shortest paths are desired. 3
Outline Motivation Related work Our method Experiments Conclusion 5
Top K genenal shortest path problem Related work • David Eppstein. Finding the k shortest paths. SIAM J.Comput. (SIAMCOMP), 28(2):652–673, 1998 Basic Idea Original Graph Shortest Path Tree Side Cost on Edges Time Complexity • O(m+nlogn+k) 6
Top K loopless Shortest Path Problem Related work • J. Y. YEN. Finding the k shortest loopless paths in a network. Manage. Sci, 17(712-716), 1971. Basic Idea s s s s s s • Find the shortest path first b b b b b b • 2-th shortest path should be e e e e e e - different from the shortest path - loopless g g g g g g - shortest in the remaining paths f f f f f f • Find the next shortest paths iteratively Time Complexity t t t t t t • O(kn(m+nlogn)) Candidate Paths 7
Top K loopless Shortest Path Problem Related work • J. Hershberger, S. Suri, and A. Bhosle. On the difficulty of some shortest path s s s s s s problems. ACM Transactions on Algorithms, 3(1), 2007 b b b b b b Basic Idea e e e e e e • Remove edge to find next shortest g g g g g g path • Use the intermediate result to lower f f f f f f the cost t t t t t t Time Complexity • O(k(m+nlogn)) • Loop in some cases Candidate Paths 8
Outline Motivation Related work Our method Experiments Conclusion 9
Basic Idea The key operation is to reduce the redundant computation cost for the same target node. We pre-compute the shortest path tree rooted at the target node We expect the candidate path searching can be terminated early with the shortest path tree The final path is the concatenation of 3 sub-paths, the first sub-path is in the current shortest path, the second one is discovered online, the third on is in the shortest path tree. • The existing method need discover the second and third sub- path online. 10
Graph Pre-processing Precompute the shortest path tree rooted at t Make side cost of each edge Assign (pre, post, parent) encoding on each node to accelarate the loop detection 11
Searching for the candidate paths On the transformed graph, we start searching with the Starting Node side cost The path with the minimal side cost equals the path with u the cost in the original graph In the seaching, the loop d2 needs be detected. l When no loop can be found, the path can be discovered s i d1 t directly 12
Path Searching Example The edge e to g cannot be considered d is then considered. But e is the ancestor of d c is then considered, but e is the ancestor of c f is then considered, f to t is not via node s b e 13
Optimization-1 k-reduction strategy: stop when • k1 shortest paths discovered; • k2 paths in candidate pool have the same length as the k1-th shortest path; • k1 + k2 ≥ k
Optimization-2 Suppose we know the length of the shortest path is l1, the length of the k-th shortest path is l2; Let th = l2 – l1; When looking for paths from deviation nodes, we can stop searching when the current accumulated side cost already exceeds th;
Optimization-2 Approximate threshold: the shortest path is needed; any other k-1 paths wil do. • Eager policy: search for k-1 candidate paths instead of one from the first deviation node; • Lazy policy: determine after there are k-1 paths in the candidate pool. • As more paths are discovered, the threshold can be adaptively updated and slowly becomes tighter.
Outline Motivation Related work Our method Experiments Conclusion 17
Experimental Evaluation Comparison algorithms: • YEN: Yen’s classic algorithm; • JH: the edge-replacement based method by John Hershberger et al. • Implementation: C++, by Hershberger et al. Our method: al implemented in Java • KR: the base method with k-reduction; • KRE: k-reduction plus Eager policy; • KRL: k-reduction plus Lazy policy;
Datasets • Real datasets: (Density = # of Edges / # of nodes) Dataset # of Nodes # of Edges Density Add32 4,960 9,462 1.91 Crack 10,240 30,380 2.97 Gupta3 16,783 4,670,105 278.26 FLA 1,070,376 2,712,798 2.53 • Synthetic datasets: • Random graphs generated by Barabasi Graph Generator (by Derek Dreier, available from Internet)
Impact of Graph Size Density = 3, # of nodes from 10k to 100k;
Impact of k Performed on real graphs.
Impact of Density k=1000
Outline Motivation Related Work Our Method Experiments Conclusion 23
Conclusion We speed up top-k shortest path discovery. • Combine Yen’s and Eppstein’s idea • Transform the candidate path discovery to the side cost graph - Terminate Earlier • Use structural labels to detect the loop effectively. • Introduce two other optimizations - Reduce number of k - Avoid the worst case of path searching.
Future work Extend the top k shortest path between two node to two node sets Find top-k shortest path core Find approximate top-k shortest path 25