このページは http://www.slideshare.net/KotaAbe/constructing-distributed-doubly-linked-list-without-distributed-locking の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

byKota Abe

約1年前 (2015/10/01)にアップロードinテクノロジー

Explains a novel distributed algorithm for constructing distributed doubly linked lists (or bidir...

Explains a novel distributed algorithm for constructing distributed doubly linked lists (or bidirectional ring), which are common in structured P2P networks.

This presentation is used at the 2015 IEEE International Conference on Peer-to-Peer Computing (P2P2015).

The paper is available at

http://rabbit.media.osaka-cu.ac.jp/research/index.php/Constructing_Distributed_Doubly_Linked_Lists_without_Distributed_Locking

Author: Kota Abe (Osaka City University/NICT), Mikio Yoshida (BBR Inc.)

Abstract:

A distributed doubly linked list (or bidirectional ring) is a fundamental distributed data structure commonly used in structured peer-to-peer networks. This paper presents DDLL, a novel decentralized algorithm for constructing distributed doubly linked lists. In the absence of failure, DDLL maintains consistency with regard to lookups of nodes, even while multiple nodes are simultaneously being inserted or deleted. Unlike existing algorithms, DDLL adopts a novel strategy based on conflict detection and sequence numbers. A formal description and correctness proofs are given. Simulation results show that DDLL outperforms conventional algorithms in terms of both time and number of messages.

- 構造化オーバーレイネットワークに適した分散双方向連結リストDDLL約6年前 by Kota Abe
- Elementary algorithms約2年前 by saurabh goel
- C++ STL 概觀8ヶ月前 by PingLun Liao

- Constructing

Distributed Doubly Linked List

without Distributed Locking

IEEE Peer-to-Peer Conference 2015

Sep 23rd–24th, 2015

Kota Abe, Osaka City University / NICT, Japan

Mikio Yoshida, BBR Inc., Japan

1 - Outline

Background

What is distributed doubly linked list

Conventional approaches

The DDLL algorithm

Procedure for node insertion, deletion and traversal

Procedure for recovery from failure

Evaluation

Comparison with conventional algorithms

Conclusion

2 - Outline

Background

What is distributed doubly linked list

Conventional approaches

The DDLL algorithm

Procedure for node insertion, deletion and traversal

Procedure for recovery from failure

Evaluation

Comparison with conventional algorithms

Conclusion

3 - Distributed Doubly Linked List

0

aka Bidirectional Ring

70

10

Commonly used in structured

P2P networks

60

20

Chord, Chord#, Skip Graph,

SkipNet, etc.

50

30

Structure

40

Pointer (e.g. IP address) to the next (successor) node

and previous (predecessor) node

We call right and left pointers

Sorted by node-specific key

Circular

4 - Maintaining Distributed Doubly Linked List

Insertion

Deletion

Traversal

Recovery

u

u

p

q

p

q

p

q

r

p

u

q

Challenges

Nodes are distributed and may be simultaneously

and independently inserted and deleted

Nodes may fail

5 - Conventional Approaches (1/2)

Eventual Consistency

Distributed Locking Approach

Approach

Use a lock🔒 to mutually exclude

Node insertion and deletion

node insertion / deletion

temporarily breaks the list

structure

Atomic Ring Maintenance (Ghodsi)

Stabilizing procedure recovers

p

q

Chord

u

u

p

q

🔒 JoinReq

🔒

u

JoinPoint

p

q

NewSucc

u

p

q

NewSuccAck 🔓

u

JoinDone

p

q

🔓

6 - Conventional Approach (2/2)

Eventual Consistency

Distributed Locking Approach

Approach

Pros 👍

Pros 👍

Lookup consistency

Easy to recover from

Cons 👎

failure

Lock disturbs another

Cons 👎

node insertion / deletion

No lookup consistency:

When a node fails, locking

Lookup results may differ

duration may be quite long

depending on the querying

Recovery procedure is

node

rather complicated

Release a lock by timeout,

which may be premature

→ locks should not be used

if possible - Outline

Background

What is distributed doubly linked list

Conventional approaches

The DDLL algorithm

Procedure for node insertion, deletion and

traversal

Procedure for recovery from failure

Evaluation

Comparison with conventional algorithms

Conclusion

8 - Our Contribution — DDLL Algorithm

DDLL = Distributed algorithm for constructing

distributed doubly linked lists

Acronym of “Distributed Doubly Linked List”

Guarantees lookup consistency without using

distributed locking (in absence of failure)

Simple and Efficient

Proved correctness (insertion and deletion procedure)

Practical

Works with non-FIFO channels (e.g. UDP)

Used in our PIAX P2P platform as a foundation of Skip

Graph and Chord# implementations

9 - Node Insertion

(2) Update right link:

u is going to be inserted

between p and q

Change p’s right link to u

u

u

p

q

p

q

(3) Update left link:

(1) u.l := p, u.r := q

Change q’s left link to u

u

u

p

q

p

q

10 - Updating Right Link (1/3)

We want to change p’s

q has been deleted

right link only if

there is no conflict

u

q

p

r

Conflicts p has been deleted

another node has been

inserted between p and q

u

p

u

v

p

q

o

q

11 - Updating Right Link (2/3)

SetR message is used for updating a right link

SetR message contains:

new right node

expected right node of the recipient node

When a SetR request is accepted, p returns a SetRAck message

Otherwise, p returns SetRNak message

Please change your right link to me (u)

if your right link still points to q and

you has not initiated deletion

SetR(u, q)

u

SetRAck

u

p

q

p

q

Ok!

12 - Updating Right Link (3/3)

Conflict case example:

another node has been

inserted between p and q

SetR(u, q)

SetRNak

u

u

v

v

p

q

p

q

p.r != q

Sorry!

Right links are always correct without using locking

13 - Updating Left Link (1/3)

Problem:

Multiple SetL messages arrive from different nodes

in arbitrary order (because we do not want to use locking)

Node must determine which SetL message is newer

Topology Change

Message Sequence

p

q

p

q

u

SetR(u, q)

u

SetRAck

p

q

v

SetR(v, q)Se

u

v

tL(v)

Se

p

q

tR

Se

Ack

tL(u)

u

v

p

q

!?

14 - Updating Left Link

rseq = 0

(2/3) p

q

lseq = 0

Solution:

u rseq = 1

SetRAck(1)

SetL message contains a

p

SetL(u, 1)

q

sequence number (seq)

lseq = 0

Each node holds a sequence

rseq = 1

number for its right node (rseq)

u

rseq is transferred using

p

q

SetRAck(2)

lseq = 1

SetRAck

rseq = 2

Each node holds the max

u

v

sequence number of SetL

p

lseq = 1 q

messages received so far (lseq)

SetL(u, 2)

SetL message is accepted only

u

rseq = 2

v

if msg.seq > lseq

p

q

lseq = 2

15 - Updating Left Link (3/3)

How our scheme solves the previous case

Topology Change

Message Sequence

p

q

p 0

q

rseq = 0

lseq = 0

0

u

SetR(u, q, 0)

SetR

0

u

Ack(

1

1

rseq = 1

)

p 0

q

0

v

SetR(v, q)

Se

Se

tL

0

u 0 v

(v

2

, 2

t

)

R

p 0

0

q

Ack(

Set

2

L

2

(u

)

,

rseq = 2

lseq = 2

1)

This SetL message

Lock is not necessary !

is staled and ignored

16 - Node Insertion Sequence

Topology Change

Message Sequence

i

p

q

p

q

i

u

u

0

SetR(u, q, 0)

0

p

q

Set

Set

i

RAck(

L(u, i

i

+1

+1

)

u

)

i+1

0

p

0

q

i+1

17 - Node Deletion Sequence

Topology Change

Message Sequence

u i2

i1

i1

p

i2 q

p

) u

q

u

tR(q, u, i2+1

Se

p i2 + 1

i2 q

Set

Se

RAck(

tL(p, i2+1

i1

)

+1

u

)

i2 + 1

p

q

i1+1 is not used

i2 + 1

18 - Insertion and Deletion

3 messages are required for insertion/deletion

A node is atomically inserted/deleted when SetR

message is accepted

If SetRNak message is received, application

retries insertion/deletion

Right links are always correct

Left links are correct when there is no SetL

message in transmission

No distributed locking

Does not require FIFO channel (UDP friendly)

19 - Traversals

Every inserted node can be

looked up either rightward or

leftward

traversing leftward

Traversing rightward: easy

X

Traversing leftward:

2.visit

1.visit

left links are not always correct

3.visit

1. Node X visits q and fetches

q.l (= p)

u

2. X visits p and fetches p.l

and p.r (= u)

p

q

3. X detects that u is missed

Incorrect left link

(because p.r != q) and X visits u

20 - Insertion Retry Optimization

Insertion requires pointers to the immediate left and right nodes

When an inserting node receives SetRNak, the node retries

Optimization: SetRNak contains the pointer to the right node

Extra messages can be eliminated

if p is not initiated deletion AND u ∈ (p, p.r)

p

q

p

q

u SetR v

u SetR v

tR

SetL

tR

SetL

Se

Se

Se

Se

tR

tR

Se

Ac

Se

A

tR

k

ck

Na

tR

k

Nak(v)

GetR

My

SetR(u, v)

R(v

Unoptimized

Optimized

)

Se

SetL

tRAck

SetR(u, v)

Se

SetL

tRAck

21 - Handling failure

So far, no failure is assumed

DDLL algorithm considers:

Crash failure

Omission failure}Omitted in this presentation

Timing failure

In asynchronous network, it is impossible to

distinguish slow nodes and failed nodes

Erroneously suspected nodes are temporarily

removed but eventually recovered

22 - Recovery | Basic

Each node maintains a

Otherwise, start recovery

neighbor node set N

v

u

N contains sufficient number of

A

B

C

left-side nodes

Each node u periodically finds

SetR(C, B, ?)

live closest left-side node v

u obtains v.r and v.rseq

A

B

C

If (v = u.l) ∧ (v.r = u)

∧ (v.rseq = u.lseq) then OK

?

v rseq u

A

B

C

?

A

B

C

lseq

23 - Recovery | Sequence Number (1)

Let’s consider the

Assigning C.lseq + 1 ?

sequence number of

the recovered link

i

A

B

C

i

SetR(C, B, ?)

SetR(C, B, i+1)

A

B

C

A

B

C

i +1

?

i +1

A

B

C

A

B

C

?

i +1

24 - Recovery | Sequence Number (2)

Subtle Case

i +1

X inserts between

A

X

B

C

SetL

B and C

B fails while SetL

i +1

to C is still in

A

X

B

C

SetL

transmission

i

SetR(C, B, i +1)

C starts recovery

i +1

X

w/o noticing X

A

B

C

SetL

Both A and X have

i +1

the same right node

i +1

i +1

(C) and the same

X

rseq (i +1)

A

B

C

i +1

C’s left link may rollback !

25 - Recovery | Sequence Number (3)

(0, i)

Solution:

A

B

C

Extend

(0, i)

sequence

(0, i +1)

number:

A

X

B

C

(recovery-

SetL(0, i)

number, seq)

SetR(C, B, (1, 0))

Recovery

(0, i +1)

number is

X

A

B

C

increased only

SetL

on recovery

(1, 0)

(0, i +1)

Left links do

X

not rollback!

A

B

C

(1, 0)

26 - Outline

Background

What is distributed doubly linked list

Conventional approaches

The DDLL algorithm

Procedure for node insertion, deletion and traversal

Procedure for recovery from failure

Evaluation

Comparison with conventional algorithms

Conclusion

27 - Evaluation

Comparison

DDLL(without optimization)

DDLL(with optimization)

Atomic Ring Maintenance (distributed-locking)

A. Ghodsi, “Distributed k-ary System: Algorithms for distributed hash

tables,” PhD Dissertation, KTH—Royal Institute of Technology, 2006.

Li’s algorithm (distributed locking, no finger table)

X. Li, et. al., “Concurrent maintenance of rings.” Distributed Comp., vol. 19,

no. 2, pp. 126–148, 2006.

Chord (eventual consistency, no finger table)

I. Stoica, et. al., “Chord: A scalable peer-to-peer lookup protocol for internet

applications,” IEEE/ACM Trans. on Net., vol. 11, no. 1, pp. 17–32, 2003.

28 - Eval | Insertion Sequence

Atomic Ring Maintenance

Li’s

DDLL

p

q

p

q

p

q

u

u

u

🔒JoinReq

🔒

Join(u)

SetR

🔒

int

Se

G

🔒

r

tL

an

Se

t(u)

tR

JoinPo

Ack

cc

, q)

NewSu

Ack(p

NewSuccAc

Done

k

🔓

🔓

JoinDone

🔓

🔓

29 - Eval | Time for Concurrent Insertion

Simulated on a

Time to converge

discrete event

time unit = one-way message

transmission time

simulator

120

DDLL(Opt)

Insert an initial node

DDLL(NoOpt)

100

Atomic

Insert n nodes in

Li's

parallel

80

Chord

(n = 1 to 100)

60

time

Measured time required

40

to converge all links

Time includes lookup

20

messages for

0

searching node

0

20

40

60

80

100

insertion position

# of simultaneously inserting nodes

DDLL(Opt) converges quickly

30 - Eval | # of Msgs for Concurrent Insertion

Measured # of

# of messages to converge

messages

5

required to

DDLL(Opt)

DDLL(NoOpt)

converge all links

4

Atomic

Li's

1000)

Chord

(x 3

2

messages

of

# 1

0 0

20

40

60

80

100

# of simultaneously inserting nodes

DDLL(Opt) uses less messages

31 - Outline

Background

What is distributed doubly linked list

Conventional approaches

The DDLL algorithm

Procedure for node insertion, deletion and traversal

Procedure for recovery from failure

Evaluation

Comparison with conventional algorithms

Conclusion

32 - Conclusion

DDLL algorithm for constructing distributed doubly linked

lists

No distributed locking

Right links are always correct, Left links converge quickly

Maintains lookup consistency (in absence of failure)

More efficient than conventional algorithms

Recovery procedure is provided

No FIFO channel is required

Correctness proofs for insertion and deletion procedure

DDLL is suitable for ring-based structured P2P networks

Real example: DDLL is used as a foundation of Skip Graph

and Chord# implementations in PIAX P2P platform

33 - Spare Slides

34 - Recovery | Sequence Number (4)

(1, 0)

X is excluded

(0, i +1)

X

from the linked

A

B

C

(1, 0)

list but

SetR(X, C, (0, 0))

eventually

(1, 0)

(0, i +1)

returns

X

A

B

C

(0, 0)

(1, 0)

(0, 0)

(0, i +1)

X

A

B

C

(1, 0)

SetRAck((1,1))

(0, 0)

(1, 1)

X

A

B

C

(1, 0) 35 - 1 pr o c e s s u

u (as the new left node) and p.rseq + 1 (= i + 1) (as the

2 var s : {out , ins , in , d e l }

sequence number of the SetL message). Next, p sends

3

l , r : { p o i n t e r t o a node or n i l }

4

l

a SetRAck message to u to notify that the insertion

seq ,

rseq : { i n t e g e r or n i l }

5 i n i t s = o u t ; l = r = n i l ; lseq = 0 ; rseq = n i l

was successful. Because left(q) is changed from p to u,

6 begin

7

{C r e a t e a l i n k e d l i s t }

the incremented right sequence number for q should be

8

( A1 ) r e c e i v e C r e a t e ( ) from app →

transferred from p to u. For this purpose, the SetRAck

9

l , r , s , lseq , rseq := u , u , in , 0 , 0

10

{ I n s e r t between

message contains

p and q}

p.rseq +1(= i+1). Finally, p changes

11 [ ] ( A2 ) r e c e i v e I n s e r t ( p , q ) from app →

p.r to u and p.rseq to 0 (rnewseq). Because u’s right link

12

i f ( s ̸= out ∨ u ̸∈ (p, q)) then error ; f i

has already been set to q, the rightward linked list is

DDLL pseudo code13 l , r, s := p, q, ins

14

send SetR ( u , r , lseq ) to l

never interrupted, even for a moment. Note that at this

15

{ D e l e t e }

moment,

16 [ ] (

p.r

A

seq = u.lseq holds.

3 )

r e c e i v e D e l e t e ( ) from app →

17

i f ( s ̸= i n ) then error

18

e l s e i f ( u = r ) then

(

{ i n c a s e o f t h e l a s t node}

A5) On receiving the SetRAck message, u confirms

19

s := o u t

that u is successfully inserted. Node u updates u.s to

20

e l s e s := d e l ; send SetR ( r , u , rseq + 1) to l ; f i

1 pr o c e s s u

in to indicate that

21 [ ]u (

u is inserted, and sets u.r

A

(as the new left node) and p.r

seq to i + 1.

4 )

r e c e i v e SetR ( rnew , rcur , rnewse

seq + 1 (= i + 1) (as the

q )

from v →

2 var s : {out , ins , in , d e l }

22

i f ( s = i n ∧ r = rcur ) then

sequence number of the SetL message). Next, p sends

3

l , r : { p o i n t e r t o a node or n i l }

23

i f ( r

(A7) On receiving the SetL message, q compares the

new

= v ) then { i n s e r t i o n case}

4

l

a SetRAck message to u to notify that the insertion

seq ,

rseq : { i n t e g e r or n i l }

24

send SetL ( rnew , rseq + 1) to r

sequence number of the SetL message with q.lseq. If the

5 i n i t s = o u t ; l = r = n i l ; lseq = 0 ; rseq = n i l

25

e l s e

was

{ d e l e t i o n c a s e

successful.

}

Because left(q) is changed from p to u,

6 begin

26

send SetL ( u , r

former is larger (we assume this case), q updates q.l to

newseq )

to rnew ; f i

7

{C r e a t e a l i n k e d l i s t }

27

the send SetRAck

incremented (rright sequence number for q should be

seq +

1 ) to v

u and q.lseq to i + 1. Otherwise, q ignores the message.

8

( A1 ) r e c e i v e C r e a t e ( ) from app →

28

r , rseq

transferred := rne

from w ,

p rne

to ws

u e.qFor this purpose, the SetRAck

9

l , r , s , lseq , rseq := u , u , in , 0 , 0

29

e l s e send SetRNak ( ) to v ; f i

In the scenario above, it is assumed that a SetRAck

10

{ I n s e r t between

message contains

p and q}

30 [ ] (

p.r

A

seq +1 (= i + 1). Finally, p changes

5 )

r e c e i v e SetRAck ( rnewseq ) from v →

11 [ ] ( A2 ) r e c e i v e I n s e r t ( p , q ) from app →

31

p. i

r fto( su = i n

and s )

p.rthen

message is sent to u in A4. If a SetRNak message is

seq to 0 (rnewseq). Because u’s right link

12

i f ( s ̸= out ∨ u ̸∈ (p, q)) then error ; f i

32

s , rseq := in , rnewseq

sent (i.e., in the case of insertion failure), then (A6) u.s

13

l , r , s := p , q , i n s

has already been set to q, the rightward linked list is

33

e l s e i f ( s = d e l ) then

is reverted to out and

14

send SetR ( u , r , lseq ) to l

34

nev s

er := out ; f i

u retries the insertion procedure

interrupted, even for a moment. Note that at this

15

{ D e l e t e }

35 [ ] ( A

from locating its insertion position.

6 )

r e c e i v e SetRNak ( ) from v →

moment,

16 [ ] (

p.r

A

seq = u.lseq holds.

3 )

r e c e i v e D e l e t e ( ) from app →

36

i f ( s = i n s ) then

17

i f ( s ̸= i n ) then error

37

s := o u t ; error {app r e t r i e s i n s e r t i o n l a t e r }

Note that a node u might receive a SetL message

18

e l s e i f ( u = r ) then

( ) On receiving the SetRAck message,

{ i n c a s e o f t h e l a s t node}

38

e l s e

A i5 f ( s = del ) then

u confirms before receiving a SetRAck message. This happens,

19

s := o u t

39

that su :=is in ; error ; f

successfullyi {app r e t

inserted. r i e s d

Node ele

u t i o n l a t

updatese r}

u.s to

20

e l s e s := d e l ; send SetR ( r , u , rseq + 1) to l ; f i 40 [ ] (A

for example, when another node is inserted between

7 )

r e c e i v e SetL ( lnew , seq ) from v →

in to indicate that

21 [ ] (

u is inserted, and sets u.r

A

seq to i + 1.

4 )

r e c e i v e SetR ( rnew , rcur , rnewseq ) from v →

41

i f ( lseq< seq ) then l , lseq := lnew , seq ; f i

p and u while the SetRAck message from p to u is

22

i f ( s = i n ∧ r = rcur ) then

42 end

23

i f ( r

(A7) On receiving the SetL message, q compares the

new

= v ) then { i n s e r t i o n case}

still in transmission. This is normal and the algorithm

24

send SetL ( rnew , rseq + 1) to r

sequence number of the SetL message with q.lseq. If thecan handle this situation. Actually we consider a node

25

e l s e { d e l e t i o n case}

Fig. 1:

former DDLL

is lar algorithm

ger (we

(without

assume this optimization)

26

send SetL ( u , r

case), q updates q.l to

newseq )

to rnew ; f i

u becomes inserted at the moment when a SetRAck

27

send SetRAck ( rseq + 1) to v

u and q.lseq to i + 1. Otherwise, q ignores the message. message is sent to u (see Section V).

28

r , rseq := rnew , rnewseq

29

e l s e send SetRNak ( ) to v ; f i

In the scenario above, it is assumed that a SetRAck

30 [ ] ( A5 ) r e c e i v e SetRAck ( rnewseq ) from v →

are executed.

Figure 3 depicts the situation where two nodes send

31

i f ( s = i n s ) then

message is sent to u in A4. If a SetRNak message isa SetL message to the same node. There are 4 nodes A,

32

s , rseq := in , rnewseq

(A

sent (i.e., in the case of insertion failure), then (A6) u.sB, C and D (A < B < C < D) and nodes A and D

2) u sets u’s left link and right link to

36

p and

33

e l s e i f ( s = d e l ) then

is reverted to out and

retries the insertion procedure

34

s := o u t ; f i

q, respectively. u also setsuu.s as ins to indicate u is

are initially inserted. A.rseq and D.lseq are i. Nodes B

35 [ ] ( A

from locating its insertion position.

6 )

r e c e i v e SetRNak ( ) from v →

inserting. u sends a SetR message to p, which contains

and C are then inserted in this order. When D receives

36

i f ( s = i n s ) then

37

s := o u t ; error

u

{app r e t r i e s i n s e r t i o n l a t e r }

(as the new right node), q (as the expected current

the SetL message from C, its left link is updated to C

Note that a node u might receive a SetL message

38

e l s e i f ( s = d e l ) then

right node, or r

and its left sequence number is updated to i + 2. When

cur), and zero (as the new right sequence

before receiving a SetRAck message. This happens,

39

s := i n ; error ; f i {app r e t r i e s d e l e t i o n l a t e r } number, or r

D later receives the SetL message from B, D ignores it

40 [ ] ( A

newseq).

for example, when another node is inserted between

7 )

r e c e i v e SetL ( lnew , seq ) from v →

41

i f ( lseq< seq ) then l , lseq := lnew , seq ; f i

because its sequence number (i + 1) is smaller than D’s

p and u while the SetRAck message from p to u is

42 end

(A4) On receiving the SetR message, p checks

left sequence number (i + 2). Thus, the receiving order

whether

still its

in status is in and

transmission. This is normal and the algorithm

rcur equals p.r. If the former

of the SetL message does not affect the final results.

can handle this situation. Actually we consider a node

Fig. 1: DDLL algorithm (without optimization)

is false, either p has not received a SetRAck message

afteruits insertion

becomes

(as we

inserted describe

at the

next,

momentSetRAck

when a mes-

SetRAck E. Deletion

sage is to inform

message is sent that

to u node

(see insertion

Section

or

V). deletion is

succeeded), or

Let us assume that node u, which is inserted between

Figure p

3 has started

depicts the its deletion.

situation

If

where the

twolatter is

are executed.

nodes send

false,a it indicates

SetL

either

message to that

the another

same

node

node.

has

There inserted

are 4

at

p and q, is going to be deleted. We also assume that both

nodes A,

(A

the right

B, Cside

andof

p.r

Dp,( or

A that

< B q<has

C been

< D) deleted.

and

In

nodes either

seq and u.lseq are i1 and that both u.rseq