- Testing Forest-Isomorphism

in the Adjacency List Model

Mitsuru Kusumoto†, Yuichi Yoshida†*

† : Preferred Infrastructure, Inc.

* : National Institute of Informatics.

1 - Overview

Given two forests G and H, determine if G ≅ H or G and H are

far from being so by looking at very small parts of G and H.

≅

?

Outline

Introduction

Property testing

Problem setting

Our algorithms

2 / 21 - Introduction

3 - Property Testing

We want to solve decision problem as efficiently as possible!!

Example : Graph connectivity

Connected

Not connected

Standard setting : BFS is enough. → Θ(n) time.

Property testing : Check if G is connected or G is far from

being connected. → O(1) time!?

4 / 21 - Property Testing

Property testing algorithm is a (randomized) algorithm that

checks if input satisfies property P or is far from P with high

probability (e.g., ≥ 2/3) with sublinear query or time complexity.

We want to

distinguish them

Connected

Not connected

Close to being

Far from being

Main Interest

connected

connected

What kinds of properties are testable efficiently?

5 / 21 - Graph Property Testing - Review

The efficiency of property testing algorithms depends on the

input models.

•

Adjacency matrix model

Input model for dense graphs. [GGR’98]

• Many properties are testable.

[01010]

(e.g., connectivity, △-freeness, ... .)

[10110]

• Necessity & sufficiency for constant-

G = [01001]

[11001]

time testability are known. [Alon+’09]

[00110]

Adjacency list model

• Input model for sparse graphs. [GR’02]

[KKR’04]

A

• Many properties are testable.

1

(e.g., connectivity, H-minor-freeness.)

2

O(v, 1) = A

v

B

• But many results assume bounded-

O(v, 2) = B

3

degree condition: degrees of vertices

C

O(v, 3) = C

must be bounded by some constant.

6 / 21 - Graph Property Testing - Review

Adjacency list model

• Input model for sparse graphs. [GR’02]

[KKR’04]

A

• Many properties are testable.

1

(e.g., connectivity, H-minor-freeness.)

2

O(v, 1) = A

B

•

v

But many results assume bounded-

O(v, 2) = B

degree condition: degrees of vertices

3

C

O(v, 3) = C

must be bounded by some constant.

What happens if we do not assume

the bounded-degree condition?

Only a few efficient algorithms.

Many hardness results: △-freeness, k-colorability, etc.,

requires Ω(√n) queries. [A+08, B+08, K+04]

Question : Is it possible to obtain efficient algorithms for

fundamental problems without bounded-degree condition?

7 / 21 - Forest-Isomorphism

We focus on forest-isomorphism in adjacency list model.

≅

?

Input : Two forests G and H represented by adjacency lists

and proximity parameter ε > 0.

Query Model : We can access to G and H via following queries:

deg(v):

returns the degree of vertex v.

adj(v, i):

returns a vertex adjacent to v by i-th edge.

random(): returns a randomly chosen vertex.

8 / 21 - Forest-Isomorphism

We focus on forest-isomorphism in adjacency list model.

≅

?

Input : Two forests G and H represented by adjacency lists

and proximity parameter ε > 0.

ε-Farness : d(G, H) := # of edge-(additions / deletions) to

transform G to H. (Graph edit distance)

For ε>0, (G, H) are ε-far from being isomorphic ⇔ d(G, H) ≥ εn.

Objective: Determine G≅H or d(G, H) ≥ εn.

9 / 21 - Forest-Isomorphism

We focus on forest-isomorphism in adjacency list model.

≅

?

Motivation

Problem is fundamental: Forest is simple structure and

isomorphism is a theoretically important problem.

Isomorphism was sometimes considered in property testing

literature. [AS’05, AS’08, NS’11]

10 / 21 - Forest-Isomorphism

We focus on forest-isomorphism in adjacency list model.

≅

?

Related Work

If there is no restriction on input, graph isomorphism testing

in the adjacency list model requires Ω(√n) queries. [FM’08]

Good motivation for our focus on forests.

If input is a bounded-degree hyperfinite graph, then graph

isomorphism is constant-time testable. [NS’11]

But if there is no degree bound, testability was unknown.

11 / 21 - Our Contribution

Query complexity

Upper bound

poly(log n)

Lower bound

Ω(√log n)

Furthermore, we obtained more general result:

If the input is a forest, every graph property is testable in

poly(log n) queries in the adjacency list model.

We use a similar technique with [Newman and Sohler’11].

12 / 21 - Overview of Our Algorithm

13 - Overview of Our Method

1. Partitioning oracle:

2. We check if each corresponding

We define a procedure that removes

part in G and H is isomorphic or far

small fractions of edges to partition

from so.

the graph into several parts with

“good” properties.

If G, H are far from being isomorphic,

there is at least one corresponding part

in G, H that is also far from being

isomorphic.

G

The Partitioning Oracle

H

14 / 21 - Partitioning Oracle

Partitioning Oracle: Given ε>0 and access to G, there exists

integer s=s(ε) and subgraph G’⊆ G s.t.,

|E(G) – E(G’)| ≤ εn / 3

Each connected component of G’ is either

s-bounded-degree-tree or s-rooted-tree.

s-bounded-degree-tree:

s-rooted tree:

A tree where

A tree where there exists v ∈ V(T) s.t.

(degree of each vertex) < s.

deg(v) ≥ s and (size of each sub-tree) < s.

(We call the vertex v a root.)

v

15 / 21 - Partitioning Oracle

Partitioning Oracle: Given ε>0 and access to G, there exists

integer s=s(ε) and subgraph G’⊆ G s.t.,

|E(G) – E(G’)| ≤ εn / 3

Each connected component of G’ is either

s-bounded-degree-tree or s-rooted-tree.

We can provide query access to G’.

Alive Edge Query: Check if edge (v, i) still exists in G’.

The subgraph G’ is chosen deterministically.

If G ≅ H, then G’ ≅ H’.

A

1

(v, 1) : not alive

2

v

B

(v, 2) : not alive

3

(v, 3) : alive

C

16 / 21 - Partitioning Oracle

Partitioning Oracle: Given ε>0 and access to G, there exists

integer s=s(ε) and subgraph G’⊆ G s.t.,

|E(G) – E(G’)| ≤ εn / 3

Each connected component of G’ is either

s-bounded-degree-tree or s-rooted-tree.

So…

If d(G, H) = 0 ⇒ d(G’, H’) = 0

G’ and H’ are chosen deterministically.

If d(G, H) ≥ εn ⇒ d(G’, H’) ≥ εn / 3

We remove at most εn / 3 edges from G and H.

Thus, it is enough to consider the partitioned graphs G’ and H’.

17 / 21 - Graph Partition

Suppose that G is obtained through the partitioning oracle.

We split G into the following parts for some constants α,γ>1.

G[0] := s-bounded degree trees in G

G[1] := s-rooted trees in G with root degrees in [s, αγ)

G[2] := s-rooted trees in G with root degrees in [αγ, αγ2)

G[3] := s-rooted trees in G with root degrees in [αγ2, αγ3)

...

O(log n) parts

G[0]

G[1]

G[2]

......

18 / 21 - Isomorphism between Each Partitions

Graph partition is useful in the following sense.

Lemma. d(G, H) ≤ Σi d(G[i], H[i]).

Proof. Transformation from G[i] to H[i] for each i would transform

G to H. □

Corollary. If d(G, H) ≥ εn, then for βi > 0 with Σ βi = ε,

∃i s.t. d(G[i], H[i]) ≥ βin. □

Thus, it suffices to check the isomorphism between G[i] and H[i]

for each i=0,1,2,….

We set β0=ε/2, β1=β2=…=O(ε / log n).

19 / 21 - Isomorphism between Each Partitions

Testing G[i]≅H[i]

For i=0 : We can use a tester for the bounded-degree model

[NS’11].

For i≥1 : We develop a new algorithm.

Sketch : We randomly sample root vertices.

For each root vertex, we randomly sample its subtrees and

create a histogram of subtrees.

After this, we compute the minimum matching between

the histograms in G and H.

This minimum matching turns out to be a good

approximation to d(G, H).

:2

:1

:2

…

20 / 21 - Conclusion

Query complexity

Upper bound

poly(log n)

Lower bound

Ω(√log n)

Actually O(log^2^poly(1/ε)(n))

If the input is a forest, every graph property is testable in

poly(log n) queries.

Future Work?

Can we obtain similar results for larger graph class than forests?

Outerplanar graphs, Bounded-tree width graphs,

Scale-free graphs, …

21 / 21 - Appendix : Lower bound

22 - Lower bound - Overview

1. We construct two distributions of input, D1, D2.

∀(G, H) ∈ D1, G ≅ H

∀(G, H) ∈ D2, d(G, H) ≥ n/8

2. We reduce the isomorphism testing to checking if two

probabilistic distributions are the same or not. This requires

Ω(√N) queries.

≅

≅

?

?

23 / 21 - Lower bound

Let Fk := (n / (2klogn)) copies of a star graph with 2k vertices

(Remark that |Fi| = n / logn)

F0

F1

F2

F3

…

Flogn

24 / 21 - Lower bound

Construct two distributions D1, D2 :

D2 : randomly assign Fk to

D1 : G=H

either G or H so that

|V(G)| = |V(H)|.

G = F0 ∪ F1 ∪ … Flogn

G = ................................

H = F0 ∪ F1 ∪ … Flogn

F0 F1 … Flogn

H = ...............................

25 / 21 - Lower bound

Because we can perform only “random-sampling” and

(degree/neighbor)-query, checking if G ≅ H is equivalent to

checking two probabilistic distributions are the same.

Lemma. We need Ω(√logn) queries to distinguish D1 and D2.

proba. to observe by

random-sampling

G

G=H

H

F0 F1 F2

Flogn

26 / 21 - Lower bound

Lemma. ∀(G, H) ∈ D2, d(G, H) ≥ n/8

Proof.

Let Φ:V(G)→V(H) be a bijection achieves minimum graph edit

distance. It holds that

d(G, H) ≥ Σv∈V(G) |deg(v) – deg(Φ(v))| / 2.

If we restrict v in the sum to the root of stars, we obtain

d(G, H) ≥ Σk=2,3,4,... (n / (2k logn)) ∙ 2k-1/2 ≥ n/8. □

Φ

Thus, Ω(logn) lower bound holds.

27 / 21