Upload
cadence-golston
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Understanding KaZaA
Jian LiangRakesh Kumar
Keith Ross
Polytechnic UniversityBrooklyn, N.Y.
Purpose of Measurement Study
• Try to understand highly successful file-sharing system– Overlay topology and dynamics– Peer selection – Index management
Big Picture of Overlay
• Two layer hierarchy– Ordinary Node (ON)– Super Node (SN)
• SNs are generally more powerful machines (CPU, network bw) and they are NOT behind NATs
FastTrack architecture• Each ON has a parent SN node• For each shared file, ON uploads to parent SN:
– Filename, ContentHash, file descriptors (metadata)• Parent SN provides ON with “SN refresh list”
– Up to 200 alive SNs, then stored at ON cache– For each SN, the list includes: IP address, port number, SN
workload (defined as ?), freshness, and timestamp• SNs also exchange SN refresh lists• Each SN maintains local index for all children ONs• Each SN maintains TCP connections with other SNs
– Overlay net• If an SN cannot answer a query, it forwards query to other
SN peers– TTL-limited flooding
• Actual file transfer is directly between peers (not through overlay) using HTTP
• All signaling traffic is encrypted
Measurement Apparatus
• KaZaA Sniffing Platform• KaZaA Probing Tool
KaZaA Sniffing Platform
• Poly (Ethernet)• Home (cable modem)
KaZaA Probing Tool• Campus & home
based probing– Probe
arbitrary SNs– Retrieve their
SN refresh lists
– Obtain workload of probed SN
SN 128.
Probe
ON
SN
ON
SN
SN
SN
ON
SN
SN
SN 24.
Probe
ON SN
SN
ON
ON
Home
Poly
KaZaA Network
SN 213.
ON
ON
ON
ON
ON
Signaling Protocol
SN–SN Node list fragment 1 [Enc]
SN–SN Node list fragment 2 [Enc]
SN–SN Node list fragment n [Enc]
ON-SN session initial
(repeat for 5 SNs)SN-SN session initial
TCP Connections Evolution at instrumented SN node
0
20
40
60
80
100
197193
289
385
481
577
673
769
865
961
1057
1153
1249
1345
1441
1537
1633
1729
1825
1921
2017
2113
2209
2305
2401
2497
on-snsn-sn
0
10
20
30
40
50
60
70
80
1
148
295
442
589
736
883
1030
1177
1324
1471
1618
1765
1912
2059
2206
2353
2500
2647
2794
2941
3088
3235
3382
3529
3676
3823
3970
on-snsn-sn
Poly campus 4 – 6 hour measurement
Cable modem 7-11 hour measurement
Some basic calculations• Estimate total number of SNs, assuming about 3M users (typical in 2004)
• About 25000-40000 SNs
• Estimate probability of SN-SN link
• About 0.1%
Signaling Sessions Lifetime
• Measured over a period of 12 hours
• Avg duration: 34mins (ON-SN) and 11mins (SN-SN)
• 30-40% of connections (both types) last for less than 30 seconds!
• What causes short-lived ON-SN connections?
• What causes short-lived SN-SN connections?
Parent selection
• Recall that ON receives a list of 200 SNs from its parent SN– Then, it can select a new parent
• How would you select the parent SN?
SN workload vs # of connections
0
20
40
60
80
100
120
140
1
104
207
310
413
516
619
722
825
928
1031
1134
1237
1340
1443
1546
1649
1752
1855
1958
2061
2164
2267
2370
2473
2576
on-snsn-sntotal
0
20
40
60
80
1 17 33 49 65 81 97 113
129
145
161
177
193
209
225
241
257
273
289
305
321
337
353
369
385
401
417
433
workload
0
20
40
60
80
100
120
140
1
145
289
433
577
721
865
1009
1153
1297
1441
1585
1729
1873
2017
2161
2305
2449
2593
2737
2881
3025
3169
3313
3457
3601
3745
3889
4033
on-snsn-sntotal
0
10
20
30
40
50
60
70
1 25 49 73 97 121
145
169
193
217
241
265
289
313
337
361
385
409
433
457
481
505
529
553
577
601
625
649
673
worload
7 - 11 hours TCP connections evolution
7 - 11 hours workload values evolution
Peer Selection: the workload of the SN clearly matters
Locality in Peer Selection: (graphs show percentage of SNs in the SN list having common
prefix with child ON and parent SN)
Peer Selection: it appears that RTT also matters:
40% of ON-SN connections have RTT<5ms
60% of SN-SN connections have RTT<50ms
Index Management: 1) No index exchange between SNs
2) SN purges metadata of ON as soon as that child disconnects from parent
3) Highly skewed contribution of metadata by different peers
Summary of Results
• 20,000 ~ 40,000 active supernodes• Each SN connects to approx. 0.1% of
other SNs • Highly dynamic connections: over 35%
SN-SN durations are less than 30 sec.
Summary of results
• Peer selection uses IP prefix match, workload, RTT and freshness
• No index exchange between SNs, but query forwarding
• Skewed content distribution: 20% peers provide 70% metadata for sharing