46
1 Rapid and Massive monitoring of DHT: crawling 10 millions of nodes in 24 hours PacSec Tokyo November 2011 In this presentation, we present our high-speed DHT Crawler to monitor 10 millions of nodes in 24 hours ! Ruo Ando NICT National Institute of Information and Communications Technology Takayuki Sugiura NetAgent Co. Ltd.

Pac sec2011 ruoando-nict-2011-11-09-01-eng

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Pac sec2011 ruoando-nict-2011-11-09-01-eng

1

Rapid and Massive monitoring of DHT: crawling 10 millions of nodes in

24 hours

PacSec Tokyo November 2011

In this presentation, we present our high-speed DHT Crawler to monitor 10 millions of nodes in 24 hours !

Ruo AndoNICT National Institute of Information and Communications Technology

Takayuki SugiuraNetAgent Co. Ltd.

Page 2: Pac sec2011 ruoando-nict-2011-11-09-01-eng

2

Overview: detecting illegal adoption in huge network

• BitTorrent becomes irreplaceable network application for distributing software and contents. But ..

• No one can know its exact scale and dynamics ! How many nodes join and disappear in BitTorrent network in 24 hours ?

• BitTorrent network is huge and no one can know about where (potential) security incidents and illegal adoption has been occurred !

• We have tackled this challenge of monitoring the largest scale network using our rapid and massive DHT crawler.

• We have succeeded to obtain 10,000,000 nodes in 24 hours !

• Also, visualizing the dynamics of BitTorrent Network is presented !

PacSec 2011

Page 3: Pac sec2011 ruoando-nict-2011-11-09-01-eng

3

Demo: observed nodes

10 millions of nodes in 24 hours !

PacSec 2011

Page 4: Pac sec2011 ruoando-nict-2011-11-09-01-eng

4

BT: The largest file sharing network in the world.

It is estimated that BitTorrent has 70 million active users and 100 million total users and it is still increasing !

PacSec 2011

Page 5: Pac sec2011 ruoando-nict-2011-11-09-01-eng

5

BitTorrent is now expanding and everywhere !

●BitTorrent in portable USB storage deviceshttp://www.iodata.com/

●Android: BitTorrent Client | aBTCAvailable in about $5 !https://market.android.com/

PacSec 2011

Page 6: Pac sec2011 ruoando-nict-2011-11-09-01-eng

6

The old new problem: illegal contents downloads

BitTorrent is the one of the most efficient way to share large files such as Operating system IOS.

Unfortunately, BT is at the same time a very efficient way to download protected (copyright) content sush as movies and music in illegal manner.

The biggest case of BitTorrent:In 2010, United States Copyrights Group(USCG) said that 23,322 IP addresses have allegedly infringed the movie of Expendables. The settlements is around $3,000 per infringement.

PacSec 2011

Page 7: Pac sec2011 ruoando-nict-2011-11-09-01-eng

7

The case of Limewire 2010 Oct

●In 2010 Oct, A New York judge ordered LimeWire to shutdown its file-sharing software.

US federal court judge issued that Limewire’sservice is used as one of the software for infringement of copyright contents.

●Later soon, the new version of Limewire called LPE (Limewire Pirate Edition) has been released as resurrection by anonymous creators.

PacSec 2011

Page 8: Pac sec2011 ruoando-nict-2011-11-09-01-eng

8

Right to be deleted or forgotten?

2010 Nov: EC announced the plan for setting out strategy to strengthen EU data protection rules.

EU people basically recognize the current Pervasive use of BitTorrent and its potential as promising. Also, EU people would like BitTorrent to be adopted in legal manner.

PacSec 2011

Page 9: Pac sec2011 ruoando-nict-2011-11-09-01-eng

9

Dot-P2P Domain seizures and BT based DNS

●2010 June: WikiLeaks leverages torrent and magnet links for distributing files.

●U.S. Immigration and Customs Enforcement (ICE) seizures the sitedomain of BT meta search engine.

●U.S proposed Combating Online Infringement and Counterfeits Act’(COICA) which would allow the Department of Justice to order the domain register to take the domain offline. COICA will be aimed to increase the government’s censorship powers.

●In a direct response to the domain seizures by US authorities, Dot-P2P project proposes ICANN or IPS independent DNS service.

●In Dot-P2P system, a request for .p2p TLD is redirected to a locally hosted DNS database. The traffic is encrypted and sent according to the BitTorrent protocol which result in that .p2p TLD is decentralized and independent of ICANN or any IPS’s DNS service.

PacSec 2011

Page 10: Pac sec2011 ruoando-nict-2011-11-09-01-eng

10

BitTorrent History

The implementation of BitTorrent has been started by Bram Cohen in 2001.

He has released client software in 2003.

In 2003, a user in EU has released ISO image of Red Hat and the 30,000 image has been downloaded in 3 days.

In 2004, he had formed BitTorrent Inc and by mid 2005, BitTorrent Inc was funded by VC.

PacSec 2011

Page 11: Pac sec2011 ruoando-nict-2011-11-09-01-eng

11

BitTorrent Traffic estimations

① “55%” - CableLabsAbout an half of upstream traffic of CATV.

② “35%” - CacheLogic“LIVEWIRE - File-sharing network thrives beneath the Radar”

③ “60%” - documents in www.sans.edu“It is estimated that more than 60% of the traffic on the internet is peer-to-peer.”

PacSec 2011

Page 12: Pac sec2011 ruoando-nict-2011-11-09-01-eng

12

Basic architecture of tracker network

① Ask Node A (newcomer) ask the tracker for searching the file.

② torrent downloadTracker provides torrent file.

③ join Node A queries node B.

④ downloadNode A can downloads pieces of file on swarm network

Seeder has a complete file.Leecher has pieces of file.

PacSec 2011

Page 13: Pac sec2011 ruoando-nict-2011-11-09-01-eng

13

BitTorrent Network tracker or DHT (trackerless)

Tracker – a dedicated machine which stores torrent files, tracks of which nodes are downloading and uploading.

DHT – decentralized network architecture to share the functionality of the tracker. DHT is decentralized, but is more scalable than pure-P2P.

DHT (Distributed Hash Table) is method using <key,value> pairs. DHT lookup method enables us to discover the location of the node who shares the responsibility of tracker of a file share.

Recently DHT network has been paid much attention due to Dot-P2P project and Pirates Bay’s confirmation of stopping tracker.

PacSec 2011

Page 14: Pac sec2011 ruoando-nict-2011-11-09-01-eng

14

DHT Protocol●DHT is not new specIntroduced to Azureus (2005) and BitCommet (2005).

●Based On Kademlia, XOR based DHTPetar Maymounkov and David Mazières. Kademlia: A peer-to-peer information system based on the XOR metric. In Proceedings of the 1st International Workshop on Peer-toPeer Systems (IPTPS '02)

●Supported by many clients apps.uTorent 1.8.5、Vuze 4.3.0.2、BitTorrent 6.3、

BitComet 1.16、Transmission 1.76

PacSec 2011

Page 15: Pac sec2011 ruoando-nict-2011-11-09-01-eng

15

DHT Protocol●Magnet links are URLs which enables each

node download and/or distribute contents without querying tracker site.

●Magnet link is provided by Pirates Bay and Mininova to fasten the download (base32 encoded and hex encoding).

●2010 Pirate Bay moves to magnet-link oriented DHT, shutting down their server.

●Magnet link enables BitTorent network tracker-less ?

PacSec 2011

Page 16: Pac sec2011 ruoando-nict-2011-11-09-01-eng

16

DHT Protocol

DHT network is scalable architecture for file sharing system. Pure P2P: hundreds of thousands of nodes DHT: millions of nodes

BitTorrent DHT network is implemented over KRPC. KRPC protocol is a RPC over UDP.

DHT Queries has four kinds of message: ping, find_node, get_peers and announce_peer. Each is implemented according to B-Encode.

PacSec 2011

Page 17: Pac sec2011 ruoando-nict-2011-11-09-01-eng

17

DHT Protocol

There are four kinds of messages of BitTorrent DHT Network: PING, STORE, FIND_NODE and FIND VALUE.

• PING : the basic query for checking the queried node is alive. 20-byte string. Network byte order.

• FIND_NODE : used to obtain the contact information of ID. Response should be a key “nodes” or the compact node info for the target node or the K (8) in its routing table.

arguments: {"id" : "<querying nodes id>", "target" : "<id of target node>"}response: {"id" : "<queried nodes id>", "nodes" :

"<compact node info>"}

PacSec 2011

Page 18: Pac sec2011 ruoando-nict-2011-11-09-01-eng

18

DHT Protocol

There are four kinds of messages of BitTorrent DHT Network: PING, STORE, FIND_NODE and FIND VALUE.

• GET_PEERS : used to cope with a torrent infohash. if the queried node has peers for the infohash, response is a key

values as a list of strings. if not, K nodes in the queried nodes routing table closest to the

infohash

• ANNOUNCE_PEER : used to announce the peer which has the querying node is downloading a torrent on a port.

arguments: {"id" : "<querying nodes id>", "info_hash" : "<20-byte infohash of target torrent>", "port" : <port number>, "token" : "<opaque token>"}

PacSec 2011

Page 19: Pac sec2011 ruoando-nict-2011-11-09-01-eng

19

Monitoring system architecture

DHT network

DHT Crawler

Key value store

Dump Data

DHT Crawler DHT Crawler

<key>=node ID <value>=data (address, port, etc)

Map Map Map

Shuffle

Reduce

Scale out !

PacSec 2011

Page 20: Pac sec2011 ruoando-nict-2011-11-09-01-eng

20

Scaling out crawlers !

DHT network

DHT Crawler

Hypervisor

DHT Crawler DHT Crawler

PacSec 2011

The response should be a key nodes of or the compact node info for the target node or the K (8) in its routing table.

Info of key nodes and K(8) should be randomly distributed.

So scaling out crawlers is effective way to expand monitoring range !

DHT crawlers is running on virtualized Linux image.

Hypervisor is VMWare ESX which provides rich interface to manage crawlers.

Page 21: Pac sec2011 ruoando-nict-2011-11-09-01-eng

21

Hadoop & MapReduce

Dump Data

Map Map Map

Shuffle

Reduce

Scale out !

PacSec 2011

Retrieval geoLocationdomain name

Translation KML (XML)

Ranking wordcountsorting

Hadoop & MapReducerunning on Linux RH

Page 22: Pac sec2011 ruoando-nict-2011-11-09-01-eng

22

Rapid crawling: 24 hours to reach 10000000 nodes !

node

0

2000000

4000000

6000000

8000000

10000000

12000000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

hourdiff

10000

100000

1000000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

PacSec 2011

Page 23: Pac sec2011 ruoando-nict-2011-11-09-01-eng

23

Visualization & ranking*.*.39.201,6881,2011/9/25 23:57:43,1*.*.210.128,62845,2011/9/25 23:56:32,1*.*.33.212,6881,2011/9/25 23:33:58,1*.*.9.21,49924,2011/9/25 23:37:02,1

IP address Time

Location Info (country, city, latlng)Domain name

KML movie

ranking 0

50

100

150

200

250

1 2 3 4 5 6 7 8 9 10 11 12

GB RU JP CN US

Figure

PacSec 2011

Page 24: Pac sec2011 ruoando-nict-2011-11-09-01-eng

24

Map Reduce

Input

Map

PacSec 2011

Map

Map

Reduce

MapReduce is the algorithm for coping with Big data.

map(key1,value) -> list<key2,value2>reduce(key2, list<value2>) -> list<value3>

MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay GhemawatOSDI'04: Sixth Symposium on Operating System Design and Implementation,San Francisco, CA, December, 2004.

Reduce

Reduce

Output

Page 25: Pac sec2011 ruoando-nict-2011-11-09-01-eng

25

Map

*.*.194.107,h116-0-194-107.catv02.itscom.jp*.*.27.107,c-76-28-27-107.hsd1.ct.comcast.net*.*.239.181,c-68-40-239-181.hsd1.mi.comcast.net*.*.44.184,pool-96-253-44-184.prvdri.fios.verizon.net*.*.170.168,cpc11-stok15-2-0-cust167.1-4.cable.virginmedia.com*.*.23.81,cpc2-stkn10-0-0-cust848.11-2.cable.virginmedia.com

hdsl1*.0.194.107 comcast verizon virginmediahdsl1 comcast

1 1 1 1 1 1 1

Log string is divided into words and assigned “1”.key-value – {word, 1}

PacSec 2011

Page 26: Pac sec2011 ruoando-nict-2011-11-09-01-eng

26

Reduce

hdsl1*.0.194.107 comcast verizon virginmediahdsl1 comcast

1 1 1 1 1 1 1

Reduce: count up 1 for each word.Key-value – {hdsl, 2} / Key-value – {comcast, 2} / Key-value – {verizon, 1}

hdsl1 comcast

1

1

1

1

1

verizon

PacSec 2011

Page 27: Pac sec2011 ruoando-nict-2011-11-09-01-eng

27

Sorting and ranking

hdsl1*.0.194.107 comcast verizon hdsl1hdsl1 comcast

1 1 1 1 1 1 1

@list1 = reverse sort { (split(/¥s/,$a))[1] <=> (split(/¥s/,$b))[1] } @list1;

hdsl1 comcast

1

1

1

1

1

verizon

1 ①②

PacSec 2011

Page 28: Pac sec2011 ruoando-nict-2011-11-09-01-eng

28

# of nodes Ranking in one day

ESWest Europe172,969 Spain20

ITWest Europe177,932 Italy19

THSouth East Asia183,008 Thailand18

SENorth Europe183,465 Sweden17

PLEast Europe184,087 Poland16

AUOceania 216,250 Australia15

KREast Asia217,409 South Korea14

BGEast Europe226,885 Bulgaria13

ROEast Europe233,536 Romania12

JPEast Asia262,678 Japan11

BRSouth America271,417 Brazil10

TWEast Asia296,856 Taiwan9

INSouth Asia309,008 India8

FRWest Europe394,005 France7

UAEast Europe399,054 Ukraine6

CANorth America408,592 Canada5

GBWest Europe414,282 UK4

CNEast Asia815,934 China3

USNorth America1,177,766 United states2

RURussia1,488,056 Russia1

DomainRegion# of nodesCountryRANK

PacSec 2011

Page 29: Pac sec2011 ruoando-nict-2011-11-09-01-eng

29

visualizationKML (Keyhole Markup Language)

■ KML is a XML-like file format for for displaying geographic data on Google Earth.

■ Timespan tag makes it possible to make our crawling log smoothly animated on Google Earth.

PacSec 2011

Page 30: Pac sec2011 ruoando-nict-2011-11-09-01-eng

30

EU: 4 UK 414,282 West Europe GB

UK (code: GB)N/A 77490London 47559 (7550000: 0.6%)Manchester 9808 (441000: 2%)Birmingham 6617Leeds 5111Glasgow 4841Brighton 4788Liverpool 4445Bristol 3814Sheffield 3536Upon 3363Edinburgh 3140Nottingham 2412Newcastle 2297Bradford 2093Tyne 2091Stoke-on-trent 2021Coventry 1965Preston 1902Reading 1814

0

50

100

150

200

250

1 2 3 4 5 6 7 8 9 10 11 12

GB RU JP CN US

PacSec 2011

Page 31: Pac sec2011 ruoando-nict-2011-11-09-01-eng

31

Rank 1 Russia 1,488,056 Moscow 284959 (13670000: 2%)Saint 69220Petersburg 69220 (4580000 : 1.5 %)N/A 51734Novgorod 35505 (1330000 : 2.6 %)Yekaterinburg 31117Velikiy 29706Perm 28858Tomsk 19083Novosibirsk 18379Voronezh 15121Irkutsk 14943Krasnoyarsk 14489Ufa 11823Lenin 11640Tyumen 11615Penza 10665Izhevsk 10259Volgograd 10126 (1000000)Saratov 9686

0

50

100

150

200

250

1 2 3 4 5 6 7 8 9 10 11 12

GB RU JP CN US

PacSec 2011

Page 32: Pac sec2011 ruoando-nict-2011-11-09-01-eng

32

Rank 1 Russia 1,488,056

ru 869194pppoe 157254broadband 120719corbina 114501ertelecom 103364dynamic 78683nationalcablenetworks 34208netbynet 28339bb 28260ufanet 26827avangarddsl 26225dyn 22174mts-nn 21939mtu-net 1999495 19274bashtel 1858894 17260nn 15149dsl 14746178 14292

Corbina Telecom/Корбина Телекомcorbina.ru/

Главная | ЭР-Телекомwww.ertelecom.ru/

UfaNet.ruwww.ufanet.ru/

PacSec 2011

Page 33: Pac sec2011 ruoando-nict-2011-11-09-01-eng

33

Demo: observed nodes in Moscow

10 millions of nodes in 24 hours !

PacSec 2011

Page 34: Pac sec2011 ruoando-nict-2011-11-09-01-eng

34

Island in the stream: Male

[root@localhost ranking]# geoiplookup -f MV, 40, Male, N/A, 4.166700, 73.500000, 0, 0

PacSec 2011

Page 35: Pac sec2011 ruoando-nict-2011-11-09-01-eng

35

Island in the stream: Arue

[root@localhost ~]# nslookup *.*.*.*Non-authoritative answer:.in-addr.arpa name = *.*.*.* dsl.dyn.mana.pf.

Authoritative answers can be found from

PF, 00, Arue, N/A, -17.516800, -149.500000, 0, 0

PacSec 2011

Page 36: Pac sec2011 ruoando-nict-2011-11-09-01-eng

36

Rank 2 United states 1,177,76N/A 207179San 29263Dallas 18899New 16213Saint 11933Houston 11401Los 10931Chicago 10876Fort 10845Park 10465Angeles 10400Brooklyn 9769York 9462Lake 8885Miami 7575Diego 7161Francisco 6743Portland 6553Washington 6266Las 6205Vegas 5956

0

50

100

150

200

250

1 2 3 4 5 6 7 8 9 10 11 12

GB RU JP CN US

PacSec 2011

25675

??

Page 37: Pac sec2011 ruoando-nict-2011-11-09-01-eng

37

Rank 2 United states 1,177,766

user 78494com 76803br 45945veloxzone 42333ono 27937dyn 2690984 8460users 475481 4336ru 426662 4189net 372585 3134mns 268182 245479 2152212 2122vivozap 1952213 1889217 1868

Veloxzoneveloxzone.com.br – Robtex??

Operadora de telefoniacelular brasileira pertencente aos gruposPortugal Telecom e Telefonica. ??

PacSec 2011

Page 38: Pac sec2011 ruoando-nict-2011-11-09-01-eng

38

Rank 11 Japan 262,678 N/A 69648Tokyo 54531 (13100000: 0.045)Osaka 7430 (8860000: ??)Yokohama 6983Nagoya 4114Kawasaki 3503Fukuoka 2989Kyoto 2875Chiba 2443Kobe 2409Sapporo 2015Shizuoka 1667Hamamatsu 1396Hiroshima 1356Setagaya 1339Nara 1239Sagamihara 1151Toyonaka 1089Kawaguchi 1077Tokorozawa 980

0

50

100

150

200

250

1 2 3 4 5 6 7 8 9 10 11 12

GB RU JP CN US

PacSec 2011

Page 39: Pac sec2011 ruoando-nict-2011-11-09-01-eng

39

Rank 11 Japan 262,678 jp 226354ne 173513ocn 52352 (8000000:0.6%)ap 38034or 22745dion 20057ppp 19918ppp-bb 17674plala 17520ad 14851mesh 11932so-net 11482eonet 11184infoweb 10615nt 9431rev 9181home 9116yournet 8507tokyo 7926ftth 7814

OCN公式サイトへようこそocn.ne.jp

auone-net 高速モバイル,光インターネットサービスプロバイダ

www.auone-net.jpPacSec 2011

Page 40: Pac sec2011 ruoando-nict-2011-11-09-01-eng

40

rank 3 China 815,934 East Asia CN

Beijing 240419 (17500000: 1%)Guangzhou 52981 (10330000 : 0.5 %?)Shanghai 27399 (18580000 : 0.1%?)Jinan 26281N/A 24695Chengdu 18835Shenyang 18566Tianjin 18460Hebei 17414Wuhan 15239Hangzhou 12997Harbin 10848Changchun 10411Nanning 10318Qingdao 10257Taiy・ 9573Hefei 9455Changsha 6988Chongqing 5641Shenzhen 5600

0

50

100

150

200

250

1 2 3 4 5 6 7 8 9 10 11 12

GB RU JP CN US

PacSec 2011

Page 41: Pac sec2011 ruoando-nict-2011-11-09-01-eng

41

rank 3 China 815,934 East Asia CN

cn 90196com 65413dynamic 65060163data 64647broad 59136adsl-pool 17127sh 10473xw 10398net 10352sx 10196gd 9641222 9297fj 8826js 8531jlccptt 7820zj 6900117 6687125 6532218 637160 6244

dynamic.163data.com.cn??

吉林省数据通信局北京新网数码信息技术有限公司??

PacSec 2011

Page 42: Pac sec2011 ruoando-nict-2011-11-09-01-eng

42

ALL citiesN/A 978457Moscow 285097 (RU:1)Beijing 240419 (CN:3)Seoul 180186 (KR) (1000000:1%)Taipei 161498 (TW:9)Kiev 117392 (RU:1)Saint 94560 (Petersburg ?)Bucharest 79336 (1940000:4%)Sofia 78445 (BG:13) New 72424Petersburg 71175 (RU:1)Central 65635 (HK?)District 65485 (HK?)Bangkok 62882 (TH:18)Delhi 62563 (IN:8)Tokyo 54531 (JP:11)London 53514 (GB:4)Guangzhou 52981 (CN:3)Athens 52656 (3680000: 1.4%)Budapest 52031 (1,733,685: 3%)

PacSec 2011

Page 43: Pac sec2011 ruoando-nict-2011-11-09-01-eng

43

All the world

net 2676477com 1369148ru 869195dynamic 685144dsl 430313comcast 303649hsd1 303626br 244534jp 226366adsl 222170cable 217597au 203850dyn 200646pppoe 187455pool 183580static 180225ne 173788broadband 173384

co 171029rr 170298res 169568ca 165639hinet 162089pl 160772it 151052fr 146154bb 143578hu 139452sbcglobal 135016ua 133288Comcast: High Speed Internet,

Cable TV, and Phone Services Deals

HiNet首頁台灣最大ISP,提供寬頻網路

sbcglobal.net - Network Solutions??

PacSec 2011

Page 44: Pac sec2011 ruoando-nict-2011-11-09-01-eng

44

Demo: flying over Eurasia

10 millions of nodes in 24 hours !

PacSec 2011

Page 45: Pac sec2011 ruoando-nict-2011-11-09-01-eng

45

conclusion In this presentation, we have shown the possibility of obtaining information of 10,000,000 nodes in 24 hours.

In current P2P and DHT network, each node can be easily monitored. And there are many challenges and interesting topics for illegal adoption of BitTorrent.

Our crawling system can provide the ranking of countries, cities and domain providers.

It is shown that DHT network is actually large and scalable network !BitTorrent has a huge potential to be alterative and unseen network architecture !

PacSec 2011

Page 46: Pac sec2011 ruoando-nict-2011-11-09-01-eng

46

Thank you for listening !