19
Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y.

Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y

Embed Size (px)

Citation preview

Page 1: Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y

Understanding KaZaA

Jian LiangRakesh Kumar

Keith Ross

Polytechnic UniversityBrooklyn, N.Y.

Page 2: Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y

Purpose of Measurement Study

• Try to understand highly successful file-sharing system– Overlay topology and dynamics– Peer selection – Index management

Page 3: Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y

Big Picture of Overlay

• Two layer hierarchy– Ordinary Node (ON)– Super Node (SN)

• SNs are generally more powerful machines (CPU, network bw) and they are NOT behind NATs

Page 4: Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y

FastTrack architecture• Each ON has a parent SN node• For each shared file, ON uploads to parent SN:

– Filename, ContentHash, file descriptors (metadata)• Parent SN provides ON with “SN refresh list”

– Up to 200 alive SNs, then stored at ON cache– For each SN, the list includes: IP address, port number, SN

workload (defined as ?), freshness, and timestamp• SNs also exchange SN refresh lists• Each SN maintains local index for all children ONs• Each SN maintains TCP connections with other SNs

– Overlay net• If an SN cannot answer a query, it forwards query to other

SN peers– TTL-limited flooding

• Actual file transfer is directly between peers (not through overlay) using HTTP

• All signaling traffic is encrypted

Page 5: Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y

Measurement Apparatus

• KaZaA Sniffing Platform• KaZaA Probing Tool

Page 6: Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y

KaZaA Sniffing Platform

• Poly (Ethernet)• Home (cable modem)

Page 7: Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y

KaZaA Probing Tool• Campus & home

based probing– Probe

arbitrary SNs– Retrieve their

SN refresh lists

– Obtain workload of probed SN

SN 128.

Probe

ON

SN

ON

SN

SN

SN

ON

SN

SN

SN 24.

Probe

ON SN

SN

ON

ON

Home

Poly

KaZaA Network

SN 213.

ON

ON

ON

ON

ON

Page 8: Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y

Signaling Protocol

SN–SN Node list fragment 1 [Enc]

SN–SN Node list fragment 2 [Enc]

SN–SN Node list fragment n [Enc]

ON-SN session initial

(repeat for 5 SNs)SN-SN session initial

Page 9: Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y

TCP Connections Evolution at instrumented SN node

0

20

40

60

80

100

197193

289

385

481

577

673

769

865

961

1057

1153

1249

1345

1441

1537

1633

1729

1825

1921

2017

2113

2209

2305

2401

2497

on-snsn-sn

0

10

20

30

40

50

60

70

80

1

148

295

442

589

736

883

1030

1177

1324

1471

1618

1765

1912

2059

2206

2353

2500

2647

2794

2941

3088

3235

3382

3529

3676

3823

3970

on-snsn-sn

Poly campus 4 – 6 hour measurement

Cable modem 7-11 hour measurement

Page 10: Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y

Some basic calculations• Estimate total number of SNs, assuming about 3M users (typical in 2004)

• About 25000-40000 SNs

• Estimate probability of SN-SN link

• About 0.1%

Page 11: Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y

Signaling Sessions Lifetime

• Measured over a period of 12 hours

• Avg duration: 34mins (ON-SN) and 11mins (SN-SN)

• 30-40% of connections (both types) last for less than 30 seconds!

• What causes short-lived ON-SN connections?

• What causes short-lived SN-SN connections?

Page 12: Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y

Parent selection

• Recall that ON receives a list of 200 SNs from its parent SN– Then, it can select a new parent

• How would you select the parent SN?

Page 13: Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y

SN workload vs # of connections

0

20

40

60

80

100

120

140

1

104

207

310

413

516

619

722

825

928

1031

1134

1237

1340

1443

1546

1649

1752

1855

1958

2061

2164

2267

2370

2473

2576

on-snsn-sntotal

0

20

40

60

80

1 17 33 49 65 81 97 113

129

145

161

177

193

209

225

241

257

273

289

305

321

337

353

369

385

401

417

433

workload

0

20

40

60

80

100

120

140

1

145

289

433

577

721

865

1009

1153

1297

1441

1585

1729

1873

2017

2161

2305

2449

2593

2737

2881

3025

3169

3313

3457

3601

3745

3889

4033

on-snsn-sntotal

0

10

20

30

40

50

60

70

1 25 49 73 97 121

145

169

193

217

241

265

289

313

337

361

385

409

433

457

481

505

529

553

577

601

625

649

673

worload

7 - 11 hours TCP connections evolution

7 - 11 hours workload values evolution

Page 14: Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y

Peer Selection: the workload of the SN clearly matters

Page 15: Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y

Locality in Peer Selection: (graphs show percentage of SNs in the SN list having common

prefix with child ON and parent SN)

Page 16: Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y

Peer Selection: it appears that RTT also matters:

40% of ON-SN connections have RTT<5ms

60% of SN-SN connections have RTT<50ms

Page 17: Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y

Index Management: 1) No index exchange between SNs

2) SN purges metadata of ON as soon as that child disconnects from parent

3) Highly skewed contribution of metadata by different peers

Page 18: Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y

Summary of Results

• 20,000 ~ 40,000 active supernodes• Each SN connects to approx. 0.1% of

other SNs • Highly dynamic connections: over 35%

SN-SN durations are less than 30 sec.

Page 19: Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y

Summary of results

• Peer selection uses IP prefix match, workload, RTT and freshness

• No index exchange between SNs, but query forwarding

• Skewed content distribution: 20% peers provide 70% metadata for sharing