Upload
ruo-ando
View
543
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
1
Rapid and Massive monitoring of DHT: crawling 10 millions of nodes in
24 hours
PacSec Tokyo November 2011
In this presentation, we present our high-speed DHT Crawler to monitor 10 millions of nodes in 24 hours !
Ruo AndoNICT National Institute of Information and Communications Technology
Takayuki SugiuraNetAgent Co. Ltd.
2
Overview: detecting illegal adoption in huge network
• BitTorrent becomes irreplaceable network application for distributing software and contents. But ..
• No one can know its exact scale and dynamics ! How many nodes join and disappear in BitTorrent network in 24 hours ?
• BitTorrent network is huge and no one can know about where (potential) security incidents and illegal adoption has been occurred !
• We have tackled this challenge of monitoring the largest scale network using our rapid and massive DHT crawler.
• We have succeeded to obtain 10,000,000 nodes in 24 hours !
• Also, visualizing the dynamics of BitTorrent Network is presented !
PacSec 2011
3
Demo: observed nodes
10 millions of nodes in 24 hours !
PacSec 2011
4
BT: The largest file sharing network in the world.
It is estimated that BitTorrent has 70 million active users and 100 million total users and it is still increasing !
PacSec 2011
5
BitTorrent is now expanding and everywhere !
●BitTorrent in portable USB storage deviceshttp://www.iodata.com/
●Android: BitTorrent Client | aBTCAvailable in about $5 !https://market.android.com/
PacSec 2011
6
The old new problem: illegal contents downloads
BitTorrent is the one of the most efficient way to share large files such as Operating system IOS.
Unfortunately, BT is at the same time a very efficient way to download protected (copyright) content sush as movies and music in illegal manner.
The biggest case of BitTorrent:In 2010, United States Copyrights Group(USCG) said that 23,322 IP addresses have allegedly infringed the movie of Expendables. The settlements is around $3,000 per infringement.
PacSec 2011
7
The case of Limewire 2010 Oct
●In 2010 Oct, A New York judge ordered LimeWire to shutdown its file-sharing software.
US federal court judge issued that Limewire’sservice is used as one of the software for infringement of copyright contents.
●Later soon, the new version of Limewire called LPE (Limewire Pirate Edition) has been released as resurrection by anonymous creators.
PacSec 2011
8
Right to be deleted or forgotten?
2010 Nov: EC announced the plan for setting out strategy to strengthen EU data protection rules.
EU people basically recognize the current Pervasive use of BitTorrent and its potential as promising. Also, EU people would like BitTorrent to be adopted in legal manner.
PacSec 2011
9
Dot-P2P Domain seizures and BT based DNS
●2010 June: WikiLeaks leverages torrent and magnet links for distributing files.
●U.S. Immigration and Customs Enforcement (ICE) seizures the sitedomain of BT meta search engine.
●U.S proposed Combating Online Infringement and Counterfeits Act’(COICA) which would allow the Department of Justice to order the domain register to take the domain offline. COICA will be aimed to increase the government’s censorship powers.
●In a direct response to the domain seizures by US authorities, Dot-P2P project proposes ICANN or IPS independent DNS service.
●In Dot-P2P system, a request for .p2p TLD is redirected to a locally hosted DNS database. The traffic is encrypted and sent according to the BitTorrent protocol which result in that .p2p TLD is decentralized and independent of ICANN or any IPS’s DNS service.
PacSec 2011
10
BitTorrent History
The implementation of BitTorrent has been started by Bram Cohen in 2001.
He has released client software in 2003.
In 2003, a user in EU has released ISO image of Red Hat and the 30,000 image has been downloaded in 3 days.
In 2004, he had formed BitTorrent Inc and by mid 2005, BitTorrent Inc was funded by VC.
PacSec 2011
11
BitTorrent Traffic estimations
① “55%” - CableLabsAbout an half of upstream traffic of CATV.
② “35%” - CacheLogic“LIVEWIRE - File-sharing network thrives beneath the Radar”
③ “60%” - documents in www.sans.edu“It is estimated that more than 60% of the traffic on the internet is peer-to-peer.”
PacSec 2011
12
Basic architecture of tracker network
① Ask Node A (newcomer) ask the tracker for searching the file.
② torrent downloadTracker provides torrent file.
③ join Node A queries node B.
④ downloadNode A can downloads pieces of file on swarm network
Seeder has a complete file.Leecher has pieces of file.
PacSec 2011
13
BitTorrent Network tracker or DHT (trackerless)
Tracker – a dedicated machine which stores torrent files, tracks of which nodes are downloading and uploading.
DHT – decentralized network architecture to share the functionality of the tracker. DHT is decentralized, but is more scalable than pure-P2P.
DHT (Distributed Hash Table) is method using <key,value> pairs. DHT lookup method enables us to discover the location of the node who shares the responsibility of tracker of a file share.
Recently DHT network has been paid much attention due to Dot-P2P project and Pirates Bay’s confirmation of stopping tracker.
PacSec 2011
14
DHT Protocol●DHT is not new specIntroduced to Azureus (2005) and BitCommet (2005).
●Based On Kademlia, XOR based DHTPetar Maymounkov and David Mazières. Kademlia: A peer-to-peer information system based on the XOR metric. In Proceedings of the 1st International Workshop on Peer-toPeer Systems (IPTPS '02)
●Supported by many clients apps.uTorent 1.8.5、Vuze 4.3.0.2、BitTorrent 6.3、
BitComet 1.16、Transmission 1.76
PacSec 2011
15
DHT Protocol●Magnet links are URLs which enables each
node download and/or distribute contents without querying tracker site.
●Magnet link is provided by Pirates Bay and Mininova to fasten the download (base32 encoded and hex encoding).
●2010 Pirate Bay moves to magnet-link oriented DHT, shutting down their server.
●Magnet link enables BitTorent network tracker-less ?
PacSec 2011
16
DHT Protocol
DHT network is scalable architecture for file sharing system. Pure P2P: hundreds of thousands of nodes DHT: millions of nodes
BitTorrent DHT network is implemented over KRPC. KRPC protocol is a RPC over UDP.
DHT Queries has four kinds of message: ping, find_node, get_peers and announce_peer. Each is implemented according to B-Encode.
PacSec 2011
17
DHT Protocol
There are four kinds of messages of BitTorrent DHT Network: PING, STORE, FIND_NODE and FIND VALUE.
• PING : the basic query for checking the queried node is alive. 20-byte string. Network byte order.
• FIND_NODE : used to obtain the contact information of ID. Response should be a key “nodes” or the compact node info for the target node or the K (8) in its routing table.
arguments: {"id" : "<querying nodes id>", "target" : "<id of target node>"}response: {"id" : "<queried nodes id>", "nodes" :
"<compact node info>"}
PacSec 2011
18
DHT Protocol
There are four kinds of messages of BitTorrent DHT Network: PING, STORE, FIND_NODE and FIND VALUE.
• GET_PEERS : used to cope with a torrent infohash. if the queried node has peers for the infohash, response is a key
values as a list of strings. if not, K nodes in the queried nodes routing table closest to the
infohash
• ANNOUNCE_PEER : used to announce the peer which has the querying node is downloading a torrent on a port.
arguments: {"id" : "<querying nodes id>", "info_hash" : "<20-byte infohash of target torrent>", "port" : <port number>, "token" : "<opaque token>"}
PacSec 2011
19
Monitoring system architecture
DHT network
DHT Crawler
Key value store
Dump Data
DHT Crawler DHT Crawler
<key>=node ID <value>=data (address, port, etc)
Map Map Map
Shuffle
Reduce
Scale out !
PacSec 2011
20
Scaling out crawlers !
DHT network
DHT Crawler
Hypervisor
DHT Crawler DHT Crawler
PacSec 2011
The response should be a key nodes of or the compact node info for the target node or the K (8) in its routing table.
Info of key nodes and K(8) should be randomly distributed.
So scaling out crawlers is effective way to expand monitoring range !
DHT crawlers is running on virtualized Linux image.
Hypervisor is VMWare ESX which provides rich interface to manage crawlers.
21
Hadoop & MapReduce
Dump Data
Map Map Map
Shuffle
Reduce
Scale out !
PacSec 2011
Retrieval geoLocationdomain name
Translation KML (XML)
Ranking wordcountsorting
Hadoop & MapReducerunning on Linux RH
22
Rapid crawling: 24 hours to reach 10000000 nodes !
node
0
2000000
4000000
6000000
8000000
10000000
12000000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
hourdiff
10000
100000
1000000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
PacSec 2011
23
Visualization & ranking*.*.39.201,6881,2011/9/25 23:57:43,1*.*.210.128,62845,2011/9/25 23:56:32,1*.*.33.212,6881,2011/9/25 23:33:58,1*.*.9.21,49924,2011/9/25 23:37:02,1
IP address Time
Location Info (country, city, latlng)Domain name
KML movie
ranking 0
50
100
150
200
250
1 2 3 4 5 6 7 8 9 10 11 12
GB RU JP CN US
Figure
PacSec 2011
24
Map Reduce
Input
Map
PacSec 2011
Map
Map
Reduce
MapReduce is the algorithm for coping with Big data.
map(key1,value) -> list<key2,value2>reduce(key2, list<value2>) -> list<value3>
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay GhemawatOSDI'04: Sixth Symposium on Operating System Design and Implementation,San Francisco, CA, December, 2004.
Reduce
Reduce
Output
25
Map
*.*.194.107,h116-0-194-107.catv02.itscom.jp*.*.27.107,c-76-28-27-107.hsd1.ct.comcast.net*.*.239.181,c-68-40-239-181.hsd1.mi.comcast.net*.*.44.184,pool-96-253-44-184.prvdri.fios.verizon.net*.*.170.168,cpc11-stok15-2-0-cust167.1-4.cable.virginmedia.com*.*.23.81,cpc2-stkn10-0-0-cust848.11-2.cable.virginmedia.com
hdsl1*.0.194.107 comcast verizon virginmediahdsl1 comcast
1 1 1 1 1 1 1
Log string is divided into words and assigned “1”.key-value – {word, 1}
PacSec 2011
26
Reduce
hdsl1*.0.194.107 comcast verizon virginmediahdsl1 comcast
1 1 1 1 1 1 1
Reduce: count up 1 for each word.Key-value – {hdsl, 2} / Key-value – {comcast, 2} / Key-value – {verizon, 1}
hdsl1 comcast
1
1
1
1
1
verizon
PacSec 2011
27
Sorting and ranking
hdsl1*.0.194.107 comcast verizon hdsl1hdsl1 comcast
1 1 1 1 1 1 1
@list1 = reverse sort { (split(/¥s/,$a))[1] <=> (split(/¥s/,$b))[1] } @list1;
hdsl1 comcast
1
1
1
1
1
verizon
1 ①②
③
PacSec 2011
28
# of nodes Ranking in one day
ESWest Europe172,969 Spain20
ITWest Europe177,932 Italy19
THSouth East Asia183,008 Thailand18
SENorth Europe183,465 Sweden17
PLEast Europe184,087 Poland16
AUOceania 216,250 Australia15
KREast Asia217,409 South Korea14
BGEast Europe226,885 Bulgaria13
ROEast Europe233,536 Romania12
JPEast Asia262,678 Japan11
BRSouth America271,417 Brazil10
TWEast Asia296,856 Taiwan9
INSouth Asia309,008 India8
FRWest Europe394,005 France7
UAEast Europe399,054 Ukraine6
CANorth America408,592 Canada5
GBWest Europe414,282 UK4
CNEast Asia815,934 China3
USNorth America1,177,766 United states2
RURussia1,488,056 Russia1
DomainRegion# of nodesCountryRANK
PacSec 2011
29
visualizationKML (Keyhole Markup Language)
■ KML is a XML-like file format for for displaying geographic data on Google Earth.
■ Timespan tag makes it possible to make our crawling log smoothly animated on Google Earth.
PacSec 2011
30
EU: 4 UK 414,282 West Europe GB
UK (code: GB)N/A 77490London 47559 (7550000: 0.6%)Manchester 9808 (441000: 2%)Birmingham 6617Leeds 5111Glasgow 4841Brighton 4788Liverpool 4445Bristol 3814Sheffield 3536Upon 3363Edinburgh 3140Nottingham 2412Newcastle 2297Bradford 2093Tyne 2091Stoke-on-trent 2021Coventry 1965Preston 1902Reading 1814
0
50
100
150
200
250
1 2 3 4 5 6 7 8 9 10 11 12
GB RU JP CN US
PacSec 2011
31
Rank 1 Russia 1,488,056 Moscow 284959 (13670000: 2%)Saint 69220Petersburg 69220 (4580000 : 1.5 %)N/A 51734Novgorod 35505 (1330000 : 2.6 %)Yekaterinburg 31117Velikiy 29706Perm 28858Tomsk 19083Novosibirsk 18379Voronezh 15121Irkutsk 14943Krasnoyarsk 14489Ufa 11823Lenin 11640Tyumen 11615Penza 10665Izhevsk 10259Volgograd 10126 (1000000)Saratov 9686
0
50
100
150
200
250
1 2 3 4 5 6 7 8 9 10 11 12
GB RU JP CN US
PacSec 2011
32
Rank 1 Russia 1,488,056
ru 869194pppoe 157254broadband 120719corbina 114501ertelecom 103364dynamic 78683nationalcablenetworks 34208netbynet 28339bb 28260ufanet 26827avangarddsl 26225dyn 22174mts-nn 21939mtu-net 1999495 19274bashtel 1858894 17260nn 15149dsl 14746178 14292
Corbina Telecom/Корбина Телекомcorbina.ru/
Главная | ЭР-Телекомwww.ertelecom.ru/
UfaNet.ruwww.ufanet.ru/
PacSec 2011
33
Demo: observed nodes in Moscow
10 millions of nodes in 24 hours !
PacSec 2011
34
Island in the stream: Male
[root@localhost ranking]# geoiplookup -f MV, 40, Male, N/A, 4.166700, 73.500000, 0, 0
PacSec 2011
35
Island in the stream: Arue
[root@localhost ~]# nslookup *.*.*.*Non-authoritative answer:.in-addr.arpa name = *.*.*.* dsl.dyn.mana.pf.
Authoritative answers can be found from
PF, 00, Arue, N/A, -17.516800, -149.500000, 0, 0
PacSec 2011
36
Rank 2 United states 1,177,76N/A 207179San 29263Dallas 18899New 16213Saint 11933Houston 11401Los 10931Chicago 10876Fort 10845Park 10465Angeles 10400Brooklyn 9769York 9462Lake 8885Miami 7575Diego 7161Francisco 6743Portland 6553Washington 6266Las 6205Vegas 5956
0
50
100
150
200
250
1 2 3 4 5 6 7 8 9 10 11 12
GB RU JP CN US
PacSec 2011
25675
??
37
Rank 2 United states 1,177,766
user 78494com 76803br 45945veloxzone 42333ono 27937dyn 2690984 8460users 475481 4336ru 426662 4189net 372585 3134mns 268182 245479 2152212 2122vivozap 1952213 1889217 1868
Veloxzoneveloxzone.com.br – Robtex??
Operadora de telefoniacelular brasileira pertencente aos gruposPortugal Telecom e Telefonica. ??
PacSec 2011
38
Rank 11 Japan 262,678 N/A 69648Tokyo 54531 (13100000: 0.045)Osaka 7430 (8860000: ??)Yokohama 6983Nagoya 4114Kawasaki 3503Fukuoka 2989Kyoto 2875Chiba 2443Kobe 2409Sapporo 2015Shizuoka 1667Hamamatsu 1396Hiroshima 1356Setagaya 1339Nara 1239Sagamihara 1151Toyonaka 1089Kawaguchi 1077Tokorozawa 980
0
50
100
150
200
250
1 2 3 4 5 6 7 8 9 10 11 12
GB RU JP CN US
PacSec 2011
39
Rank 11 Japan 262,678 jp 226354ne 173513ocn 52352 (8000000:0.6%)ap 38034or 22745dion 20057ppp 19918ppp-bb 17674plala 17520ad 14851mesh 11932so-net 11482eonet 11184infoweb 10615nt 9431rev 9181home 9116yournet 8507tokyo 7926ftth 7814
OCN公式サイトへようこそocn.ne.jp
auone-net 高速モバイル,光インターネットサービスプロバイダ
www.auone-net.jpPacSec 2011
40
rank 3 China 815,934 East Asia CN
Beijing 240419 (17500000: 1%)Guangzhou 52981 (10330000 : 0.5 %?)Shanghai 27399 (18580000 : 0.1%?)Jinan 26281N/A 24695Chengdu 18835Shenyang 18566Tianjin 18460Hebei 17414Wuhan 15239Hangzhou 12997Harbin 10848Changchun 10411Nanning 10318Qingdao 10257Taiy・ 9573Hefei 9455Changsha 6988Chongqing 5641Shenzhen 5600
0
50
100
150
200
250
1 2 3 4 5 6 7 8 9 10 11 12
GB RU JP CN US
PacSec 2011
41
rank 3 China 815,934 East Asia CN
cn 90196com 65413dynamic 65060163data 64647broad 59136adsl-pool 17127sh 10473xw 10398net 10352sx 10196gd 9641222 9297fj 8826js 8531jlccptt 7820zj 6900117 6687125 6532218 637160 6244
dynamic.163data.com.cn??
吉林省数据通信局北京新网数码信息技术有限公司??
PacSec 2011
42
ALL citiesN/A 978457Moscow 285097 (RU:1)Beijing 240419 (CN:3)Seoul 180186 (KR) (1000000:1%)Taipei 161498 (TW:9)Kiev 117392 (RU:1)Saint 94560 (Petersburg ?)Bucharest 79336 (1940000:4%)Sofia 78445 (BG:13) New 72424Petersburg 71175 (RU:1)Central 65635 (HK?)District 65485 (HK?)Bangkok 62882 (TH:18)Delhi 62563 (IN:8)Tokyo 54531 (JP:11)London 53514 (GB:4)Guangzhou 52981 (CN:3)Athens 52656 (3680000: 1.4%)Budapest 52031 (1,733,685: 3%)
PacSec 2011
43
All the world
net 2676477com 1369148ru 869195dynamic 685144dsl 430313comcast 303649hsd1 303626br 244534jp 226366adsl 222170cable 217597au 203850dyn 200646pppoe 187455pool 183580static 180225ne 173788broadband 173384
co 171029rr 170298res 169568ca 165639hinet 162089pl 160772it 151052fr 146154bb 143578hu 139452sbcglobal 135016ua 133288Comcast: High Speed Internet,
Cable TV, and Phone Services Deals
HiNet首頁台灣最大ISP,提供寬頻網路
sbcglobal.net - Network Solutions??
PacSec 2011
44
Demo: flying over Eurasia
10 millions of nodes in 24 hours !
PacSec 2011
45
conclusion In this presentation, we have shown the possibility of obtaining information of 10,000,000 nodes in 24 hours.
In current P2P and DHT network, each node can be easily monitored. And there are many challenges and interesting topics for illegal adoption of BitTorrent.
Our crawling system can provide the ranking of countries, cities and domain providers.
It is shown that DHT network is actually large and scalable network !BitTorrent has a huge potential to be alterative and unseen network architecture !
PacSec 2011
46
Thank you for listening !