Upload
federico-maggi
View
5.162
Download
4
Tags:
Embed Size (px)
DESCRIPTION
I presented the results of this research work at #www2013. Abstract: URL shortening services have become extremely popular. However, it is still unclear whether they are an effective and reliable tool that can be leveraged to hide malicious URLs, and to what extent these abuses can impact the end users. With these questions in mind, we first analyzed existing countermeasures adopted by popular shortening services. Surprisingly, we found such countermeasures to be ineffective and trivial to bypass. This first measurement motivated us to proceed further with a large-scale collection of the HTTP interactions that originate when web users access live pages that contain short URLs. To this end, we monitored 622 distinct URL shortening services between March 2010 and April 2012, and collected 24,953,881 distinct short URLs. With this large dataset, we studied the abuse of short URLs. Despite short URLs are a significant, new security risk, in accordance with the reports resulting from the observation of the overall phishing and spamming activity, we found that only a relatively small fraction of users ever encountered malicious short URLs. Interestingly, during the second year of measurement, we noticed an increased percentage of short URLs being abused for drive-by download campaigns and a decreased percentage of short URLs being abused for spam campaigns. In addition to these security-related findings, our unique monitoring infrastructure and large dataset allowed us to complement previous research on short URLs and analyze these web services from the user’s perspective.
Citation preview
Two Years of Short URL Internet Measurement
http://flic.kr/phretor
Federico Maggi, Alessandro Frossi, Stefano ZaneroGianluca Stringhini, Brett Stone-Gross, Christopher Kruegel, Giovanni Vigna
How short URLs work: Creating an alias.
URL shorteningservice
http://example.com/very/long/?url=to&the=landing-pagelong URL
http://ab.cd/d73fYfzshort URL
RANDOM SUFFIXIS GENERATED
"make me shorter"
How short URLs work: Resolving an alias.
http://ab.cd/d73fYfz
REDIRECTIONMECHANISM
http://example.com/very/long/...HTTP 302
HTML META Refresh
JavaScript
ActionScript
redirection happens on the browser's side
URL shorteningservice
From a security point of view
Perfect mean for masquerading malicious URLs
• Trivially evade naïve filters
• Robust to email clients that break long URLs into multiple lines
• Trendy effect (e.g., Twitter, Facebook)
• Users have grown accustomed to short URLs
http://i.am/harmless http://evil.com/very/long/and/suspicous/path
Do shortening services take security measures?
Do shortening services take security measures?
• Prepare lists of malicious long URLs
• Malware (e.g., drive-by download) from Wepawet
• Spam from Spamhaus
• Phishing from PhishTank
• Performed 3 small measurement experiments
• Creation
• Resolution
• Periodic checks
Creation of malicious short URLs
• Submit for shortening to the top 6 services (as of 2010)
1. bit.ly
2. durl.me
3. goo.gl
4. is.gd
5. migre.me
6. tinyurl.com
• Do they accept malicious long URLs?
URL shorteningservice
http://example.com/very/long/?url=to&the=evil.comlong URL
"make me shorter"
Blacklist
"sure, here is your short URL"
?
Malicious long URLs accepted by top services
Service Malware Phishing Spam
# % # % # %
bit.ly 2,000 100.0 2,000 100.0 2,000 100.0durl.me 1,999 99.9 1,987 99.4 1,976 98.8goo.gl* 2000 99.9 994 99.4 1,000 100.0
is.gd 1,854 92.7 1,834 91.7 364 18.2migre.me 1,738 86.9 1,266 63.3 1,634 81.7
tinyurl.com 1,959 99.5 1,935 96.8 587 29.4
Overall 9,550 95.5 9,022 90.2 6,561 65.6
Resolution of malicious short URLs
• Resolved the malicious short URLs with the same services
• Do they resolve malicious URLs?
URL shorteningservice
http://i.am/so-evilshort URL
"please resolve this"
Mappingtable or blacklist
"sure, here is your long URL"
Malicious short URLs resolved by top services
Service Malware Phishing Spam
bit.ly 0.05 11.3 0.0durl.me 0.0 0.0 0.0goo.gl* 66.4 96.9 78.7
is.gd 1.08 2.27 0.8migre.me 0.86 14.0 0.0
tinyurl.com 0.66 0.7 2.04
Overall 21.45 26.39 31.38
Periodic checks on existing short URLs
• Created a dynamic landing page (i.e., long URL)
• Submission time (day zero): serve benign content (e.g., text & images)
• ...
URL shorteningservice
http://our.server/dynamic-page.phpdynamic long URL
"make me shorter"
Blacklist
"is this long URL malicious?"No!
http://ab.cd/good-today
...we changed the content of the page after 1 week
• ...
• Resolution time (each subsequent week): serve malicious content
• Do services periodically check and sanitize existing short URLs?
URL shorteningservice
http://ab.cd/good-todayshort URL
"please resolve this"
Aliases DB
"is the long URL malicious?"No!
http://our.server/dynamic-page.php
Deferred malicious short URLs survived
Threat Shortened Blocked Not Blocked
Malware 5,000 20% 80%Phishing 5,000 20% 80%
Spam 5,000 20% 80%
Overall 15,000 20% 80%durl.me
This motivated us to start a measurement experimentto collect short URLs
A different perspective what is the impact on users?
• Most of previous work used Twitter as a source of short URLs
• Types of questions
• Do users stumble upon malicious short URLs very often?
• Are users aware of malicious short URLs?
• What kind of short URLs users typically encounter?
User-centered measurement.
Measurement infrastructure
JS
Our resolver
Collectors (users)
http://ab.cd/4jaYashttp://ab.cd/4jaYas
http://ab.cd/4jaYas
http://ab.cd/4jaYas
LANDINGPAGE
http://ab.cd/sfb4Ac
http://ab.cd/asd31A
http://ab.cd/5aD3B9
http://ab.cd/419E9s
http://example.com/container
Container page
http://ab.cd/sfb4Achttp://ab.cd/asd31A
http://ab.cd/5aD3B9http://ab.cd/419E9s
Biased measurements?
• We do not ask a user to become a collector
• We want users to subscribe as collectors spontaneously
• How? We provide a useful service!
Collected data between 2010 and 2012
• Total 7,000 distinct users (estimate from 1,370,277 distinct IPs)
• about 500 to 1,000 active users per day
• about 20,000 to 50,000 short URLs sent each day (100,000 peaks)
• 24,953,881 distinct short URLs collected overall
Dataset statistics (geolocation of the submitters)
Dataset statistics (top services)
Distinct URLs Log entries
10,069,846 bit.ly 24,818,239 bit.ly
4,725,125 t.co 12,054,996 t.co
1,418,418 tinyurl.com 5,649,043 tinyurl.com
816,744 ow.ly 2,188,619 goo.gl
800,761 goo.gl 2,053,575 ow.ly
638,483 tumblr.com 1,214,705 j.mp
597,167 fb.me 1,159,536 fb.me
584,377 4sq.com 1,116,514 4sq.com
517,965 j.mp 1,066,325 tumblr.com
464,875 tl.gd 1,045,380 is.gd
Do users stumble upon malicious short URLs very often?
Blacklist Phishing Malware Spam
Spamhaus - - 10,306
Phishtank 7 - -
Wepawet - 6,057 -
Safe Browsing 913 2,405 -
Malicious short URLs encountered by submitters
Top sources
Social networksBlogsNews
Overall baseline
ChatroomsWebmailsOnline IMs
Malicious page
http://ab.cd/sfb4Ac
http://ab.cd/asd31A
http://ab.cd/5aD3B9
http://ab.cd/419E9s
http://ab.cd/sfb4Ac
. . .
http://ab.cd/5aD3B9
http://ab.cd/419E9s
Aliasing of of malicious URLs
Aliasing of malicious pages using short URLs
x = #distinct short URL aliases per landing page
Spam pages not aliasedas much as before.
DRAFT COPY
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 10 100 1000
CD
F
BenignSpam URLs
Malware URLs
(a) Distinct short URLs per distinct maliciousor benign landing URL from April 2010 to April2011.
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1
1 10 100
CD
FBenign
Spam URLsMalware URLs
(b) Distinct short URLs per distinct maliciousor benign landing URL from April 2010 to April2012.
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 10 100 1000
CD
F
Benign URLsSpam URLs
Malware URLs
(c) Distinct containers per distinct malicious orbenign short URL from April 2010 to April 2011.
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 10 100 1000
CD
F
Benign URLsSpam URLs
Malware URLs
(d) Distinct containers per distinct malicious orbenign short URL from April 2010 to April 2012.
Figure 5: Comparison of the number of distinct short URLs per unique landing page (a, c) and distinctcontainer page per unique short URL (b, d) after 1 year (a, b) and after 2 years (c, d). The distribution havechanged over time: (a) spam pages were generally aliased with a larger number of short URLs than benignpages; (b) in the following year spammers did not to alias their pages as much as before. Also, (c) spam shortURLs used to be spread over a larger number of container pages than benign short URLs; however, (c) theynow exhibit the same distribution as their benign counterpart. We may argue that short URLs are not seenanymore as a valuable mean of aliasing spam pages.
over their evolution. This is corroborated by the statisticsabout the abuse of short URLs found in latest three APWGreports [12–14]: After a growing trend, at the beginning ofour measurement (2010), the subsequent reports highlight astable (2011) and decreasing (2012) trend.
4.2 The Short URLs EcosystemAs part of their research, Antoniades and colleagues in [2]
have analyzed the category of the pages to which bit.ly andow.ly short URLs typically point to, along with the categoryof the container page, that they had available for bit.ly URLsonly. They assigned categories to a selection of URLs. Wedid a similar yet more comprehensive analysis by character-izing all the short URLs that we collected by means of thecategories described in the following.Categorizing an arbitrary-large number of websites au-
tomatically is a problem that has no solution. However,our goal was to obtain a coarse-grained categorization. Tothis end, we relied on community-maintained directories andblacklists. More precisely, we classified the container pages(about 25,000,000 distinct URLs) and landing pages (about22,000,000 distinct URLs) using the DMOZ Open DirectoryProject (http://www.dmoz.org) and URLBlacklist.com. The
former is organized in a tree structure and includes 3,883,992URLs: URLs are associated to nodes, each with localized,regional mirrors. We expanded these nodes by recursivelymerging the URLs found in these mirrors. The latter comple-ments the DMOZ database with about 1,607,998 URLs anddomains metadata. URLBlacklist.com is used by web-filteringtools such as SquidGuard (http://www.squidguard.org) andcontains URLs belonging to clean categories (e.g., gardening,news), possibly undesired subjects (e.g., adult sites), and alsomalicious pages (i.e., 22.6% of the sites categorized as “anti-spyware”, 18.15% of those categorized as “hacking”, 8.29%of pages falling within “searchengine” domains, and 5.7% ofthe sites classified as “onlinepayment” are in this order, themost rogue categories according to an analysis that we runthrough McAfee SiteAdvisor 3). Overall, we ended up with74 categories. For clearer visualization, we selected the 48most frequent categories. These include, for example, “so-cialnetworking,”“adult,”“abortion,”“contraception,”“chat,”etc. We reserved the word “other” for URLs belonging tothe less meaningful categories that we removed, or for URLs
3http://siteadvisor.com/sites/
Until first year Until second year
Spam pages more aliasedthan benign pages.
http://ab.cd/asd31A
http://ab.cd/5aD3B9
http://ab.cd/sfb4Ac
http://ab.cd/419E9s
Container page 1
http://ab.cd/asd31A
http://ab.cd/5aD3B9
http://ab.cd/419E9s
http://ab.cd/sfb4Ac
Container page 2
. . .
http://ab.cd/sfb4Ac
http://ab.cd/asd31A
http://ab.cd/5aD3B9
http://ab.cd/419E9s
Container page N
Dissemination of malicious short URLs
http://ab.cd/sfb4Ac
Spreading of malicious short URLs
DRAFT COPY
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 10 100 1000
CD
F
BenignSpam URLs
Malware URLs
(a) Distinct short URLs per distinct maliciousor benign landing URL from April 2010 to April2011.
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1
1 10 100
CD
F
BenignSpam URLs
Malware URLs
(b) Distinct short URLs per distinct maliciousor benign landing URL from April 2010 to April2012.
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 10 100 1000
CD
F
Benign URLsSpam URLs
Malware URLs
(c) Distinct containers per distinct malicious orbenign short URL from April 2010 to April 2011.
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 10 100 1000
CD
FBenign URLs
Spam URLsMalware URLs
(d) Distinct containers per distinct malicious orbenign short URL from April 2010 to April 2012.
Figure 5: Comparison of the number of distinct short URLs per unique landing page (a, c) and distinctcontainer page per unique short URL (b, d) after 1 year (a, b) and after 2 years (c, d). The distribution havechanged over time: (a) spam pages were generally aliased with a larger number of short URLs than benignpages; (b) in the following year spammers did not to alias their pages as much as before. Also, (c) spam shortURLs used to be spread over a larger number of container pages than benign short URLs; however, (c) theynow exhibit the same distribution as their benign counterpart. We may argue that short URLs are not seenanymore as a valuable mean of aliasing spam pages.
over their evolution. This is corroborated by the statisticsabout the abuse of short URLs found in latest three APWGreports [12–14]: After a growing trend, at the beginning ofour measurement (2010), the subsequent reports highlight astable (2011) and decreasing (2012) trend.
4.2 The Short URLs EcosystemAs part of their research, Antoniades and colleagues in [2]
have analyzed the category of the pages to which bit.ly andow.ly short URLs typically point to, along with the categoryof the container page, that they had available for bit.ly URLsonly. They assigned categories to a selection of URLs. Wedid a similar yet more comprehensive analysis by character-izing all the short URLs that we collected by means of thecategories described in the following.Categorizing an arbitrary-large number of websites au-
tomatically is a problem that has no solution. However,our goal was to obtain a coarse-grained categorization. Tothis end, we relied on community-maintained directories andblacklists. More precisely, we classified the container pages(about 25,000,000 distinct URLs) and landing pages (about22,000,000 distinct URLs) using the DMOZ Open DirectoryProject (http://www.dmoz.org) and URLBlacklist.com. The
former is organized in a tree structure and includes 3,883,992URLs: URLs are associated to nodes, each with localized,regional mirrors. We expanded these nodes by recursivelymerging the URLs found in these mirrors. The latter comple-ments the DMOZ database with about 1,607,998 URLs anddomains metadata. URLBlacklist.com is used by web-filteringtools such as SquidGuard (http://www.squidguard.org) andcontains URLs belonging to clean categories (e.g., gardening,news), possibly undesired subjects (e.g., adult sites), and alsomalicious pages (i.e., 22.6% of the sites categorized as “anti-spyware”, 18.15% of those categorized as “hacking”, 8.29%of pages falling within “searchengine” domains, and 5.7% ofthe sites classified as “onlinepayment” are in this order, themost rogue categories according to an analysis that we runthrough McAfee SiteAdvisor 3). Overall, we ended up with74 categories. For clearer visualization, we selected the 48most frequent categories. These include, for example, “so-cialnetworking,”“adult,”“abortion,”“contraception,”“chat,”etc. We reserved the word “other” for URLs belonging tothe less meaningful categories that we removed, or for URLs
3http://siteadvisor.com/sites/
x = #distinct container pages per short URL
Until first year Until second year
Spam short URLs almost undistinguishable from benign ones.
Spam short URLs spread on morecontainer pagesthan benign ones.
Total lifespan of a malicious short URL
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 10 100 1000
CD
F
Number of unique short URL per distinct landing URL (log-scale)
BenignSpam URLs
Malware URLs
(a) Distinct short URLs per distinct maliciousor benign landing URL.
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 10 100 1000
CD
F
(log) Number of unique container page per distinct short URL
Benign URLsSpam URLs
Malware URLs
(b) Distinct containers per distinct malicious orbenign short URL.
Figure 10. Number of distinct short URLs per unique landing page and distinct container page perunique short URL. Notably, benign short URLs exhibit different characteristics if compared withmalicious ones: More precisely, (b) spam short URLs are generally spread on a larger number ofcontainer pages than benign ones. Surprisingly, (a) more unique short URLs are created for thesame malicious landing page, whereas benign long URLs are typically less aliased. Phishing shortURLs are a minority with respect to the others, thus are not accounted.
We derived the maximum lifespan of each collected URL based on historical accesslogs to their container pages. We calculated the maximum lifespan (or simply lifespan)as the delta time between first and latest (i.e., more recent) occurrence of each shortURL in our database. More specifically, our definition of lifespan accounts for thefact that short URLs may disappear from some container pages and reappear after awhile on the same or other container pages. Fig. 11 shows the empirical cumulativedistribution frequency of the lifespan of malicious versus benign short URLs. About95% of the benign short URLs have a lifetime around 20 days, whereas 95% of themalicious short URLs lasted about 4 months. For example, we observed a spam campaignspanning between April 1st and June 30th 2010 that involved 1,806 malicious short URLsredirecting to junk landing pages; this campaign lasted about three months before beingremoved by tinyurl.com administrators. The latest MessageLabs Intelligence Annual
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
CD
F
Lifespan (hours between first and latest occurrence of a short URLs in our dataset)
3-months spam campaign disseminatedvia 1,806 tinyurl.com short URLs(April 1st-June 30, 2010)
BenignMalicious
Malicious (excluding 3-months Apr-Jun spam campaign)
Figure 11. Delta time between first and latest occurrence of malicious versus benign short URLs.The “peak” indicates a high, about 50 %, amount of spam short URLs that lasted about threemonths. Malicious URLs are usually found on multiple different container pages even for extendedperiods of time, whereas benign short URLs follow the “one-day-of-fame effect” pattern.
Malicious short URLs typically survive longerthan benign ones.
We found out a spam campaign (Storm botnet?) with1,806 short URLs. Deleted by tinyurl.com's admins after 3 months.
x = #hours between the first and latest occurrence of a short URL
Conclusions
• Do shortening services take enough countermeasures to protect the users?
• Some of them use blacklists but do not proactively check existing aliases
• Do users stumble upon malicious short URLs very often?
• Not very often: about 0.1% overall.
• Are users aware of malicious short URLs?
• Not much: almost no one clicked on our "+flag as malicious" link. Also confirmed by [Onarlioglu et al., NDSS 2012]
Questions? Federico [email protected]@phretor
What happens when users click on a short URL?
r Category
0.00 naturism 0.18 artnudes 0.36 weapons 0.75 shopping0.01 personalfinance 0.21 antispyware 0.36 cleaning 0.78 games0.01 do-it-yourself 0.23 drinks 0.37 dating 0.80 news0.03 pets 0.25 medical 0.39 vacation 0.82 government0.04 gardening 0.25 weather 0.40 religion 0.88 chat0.07 clothing 0.30 onlinegames 0.42 culinary 0.90 blog0.07 mail 0.32 jobsearch 0.45 filehosting 0.91 socialnetworking0.09 banking 0.33 sportnews 0.52 kidstimewasting 1.00 contraception0.12 abortion 0.33 gambling 0.55 ecommerce 1.00 childcare0.12 instantmessaging 0.36 drugs 0.67 adult 1.00 astrology0.13 jewelry 0.36 searchengines 0.68 audio-video 1.00 cellphones0.18 hacking 0.36 weapons 0.69 sports 1.00 onlineauctions
1.00 onlinepayment
⇢ =In(cat)
In(cat) +Out(cat)
⇢ ! 0⇢ ! 1
Many outbound short URLs (aggregators, e.g., Twitter)Many inbound short URLs (landing pages, e.g., news, blogs)
Are the shortening service similarin terms of content aliased?
Content-specific vs. general-purpose services
Table 3. Ranking of categories by the ratio of incoming and outgoing connections via short URLs.
r Category
0.00 naturism 0.18 artnudes 0.36 weapons 0.75 shopping0.01 personalfinance 0.21 antispyware 0.36 cleaning 0.78 games0.01 do-it-yourself 0.23 drinks 0.37 dating 0.80 news0.03 pets 0.25 medical 0.39 vacation 0.82 government0.04 gardening 0.25 weather 0.40 religion 0.88 chat0.07 clothing 0.30 onlinegames 0.42 culinary 0.90 blog0.07 mail 0.32 jobsearch 0.45 filehosting 0.91 socialnetworking0.09 banking 0.33 sportnews 0.52 kidstimewasting 1.00 contraception0.12 abortion 0.33 gambling 0.55 ecommerce 1.00 childcare0.12 instantmessaging 0.36 drugs 0.67 adult 1.00 astrology0.13 jewelry 0.36 searchengines 0.68 audio-video 1.00 cellphones0.18 hacking 0.36 weapons 0.69 sports 1.00 onlineauctions
1.00 onlinepayment
landing pages. This is the case, for example, of 4sq.com (about 100%), whose shortURLs always bring from online social-networking sites to pages categorized as “other”.The most popular shortening services (e.g., bit.ly, goo.gl, ow.ly) fall into the second tier(i.e., 32–48%), together with those services that cover a wide variety of categories, andtypically interconnect pages of different categories.
Non-obvious Uses of Short URLs We also analyzed how short URLs interconnecttogether pages of different categories, to understand whether some categories have amajority of container or landing pages. To this end, we calculated the average frequencyof category change from the perspective of the container page and landing page. With thisdata we created a weighted digraph with 48 nodes, each corresponding to a category. Theweights are the frequencies of change, calculated between each pair of categories—andaveraged over all the short URLs and pages within each category; weights are between10.19 and 39.41% and distributed as shown in Fig. 8. We then calculate the averageweight of incoming, In(cat), and outgoing, Out(cat), edges for each category cat, andfinally derive the ratio r(cat) = In(cat)
In(cat)+Out(cat) . When r ! 0, the category has a majorityof outgoing short URLs (i.e., many container pages of such category), whereas r ! 1
tru
nc.
it
om
.ly
slid
esh
a.re
mo
by
.to
mig
re.m
e
wp
.me
tum
blr
.co
m
lnk
.ms
cot.
ag
flic
.kr
icio
.us
tin
y.l
y
amzn
.to
po
st.l
y
yo
utu
.be
p.t
l
tcrn
.ch
tl.g
d
ny
ti.m
s
ht.
ly
altu
rl.c
om
tin
yso
ng
.co
m
dld
.bz
su.p
r
ust
re.a
m
t.co
j.m
p
dlv
r.it
fb.m
e
twu
rl.n
l
go
o.g
l
ow
.ly
ff.i
m
pin
g.f
m
is.g
d
tin
y.c
c
bit
.ly
tin
yu
rl.c
om
mas
h.t
o
ur1
.ca
sqze
.it
cort
.as
shar
.es
4sq
.co
m
0 2
0 4
0 6
0 8
0 1
00
Med
ian
% c
ateg
ory
dri
ft
Most popular shorteners are also general-purpose and cover a wide variety of categories
#categories covered (min. 0, max. 48)
Figure 7. Frequency of change of category (median with 25- and 75-percent quantiles) and numberof categories covered (size of black dot) of the top 50 services. The most popular, general-purposeshortening services highlighted are characterized by an ample set of categories (close to 48, whichis the maximum) and short URLs that, in 32–48% of the cases, are published on pages havingcategories different from the landing page category.
Table 3. Ranking of categories by the ratio of incoming and outgoing connections via short URLs.
r Category
0.00 naturism 0.18 artnudes 0.36 weapons 0.75 shopping0.01 personalfinance 0.21 antispyware 0.36 cleaning 0.78 games0.01 do-it-yourself 0.23 drinks 0.37 dating 0.80 news0.03 pets 0.25 medical 0.39 vacation 0.82 government0.04 gardening 0.25 weather 0.40 religion 0.88 chat0.07 clothing 0.30 onlinegames 0.42 culinary 0.90 blog0.07 mail 0.32 jobsearch 0.45 filehosting 0.91 socialnetworking0.09 banking 0.33 sportnews 0.52 kidstimewasting 1.00 contraception0.12 abortion 0.33 gambling 0.55 ecommerce 1.00 childcare0.12 instantmessaging 0.36 drugs 0.67 adult 1.00 astrology0.13 jewelry 0.36 searchengines 0.68 audio-video 1.00 cellphones0.18 hacking 0.36 weapons 0.69 sports 1.00 onlineauctions
1.00 onlinepayment
landing pages. This is the case, for example, of 4sq.com (about 100%), whose shortURLs always bring from online social-networking sites to pages categorized as “other”.The most popular shortening services (e.g., bit.ly, goo.gl, ow.ly) fall into the second tier(i.e., 32–48%), together with those services that cover a wide variety of categories, andtypically interconnect pages of different categories.
Non-obvious Uses of Short URLs We also analyzed how short URLs interconnecttogether pages of different categories, to understand whether some categories have amajority of container or landing pages. To this end, we calculated the average frequencyof category change from the perspective of the container page and landing page. With thisdata we created a weighted digraph with 48 nodes, each corresponding to a category. Theweights are the frequencies of change, calculated between each pair of categories—andaveraged over all the short URLs and pages within each category; weights are between10.19 and 39.41% and distributed as shown in Fig. 8. We then calculate the averageweight of incoming, In(cat), and outgoing, Out(cat), edges for each category cat, andfinally derive the ratio r(cat) = In(cat)
In(cat)+Out(cat) . When r ! 0, the category has a majorityof outgoing short URLs (i.e., many container pages of such category), whereas r ! 1
tru
nc.
it
om
.ly
slid
esh
a.re
mo
by
.to
mig
re.m
e
wp
.me
tum
blr
.co
m
lnk
.ms
cot.
ag
flic
.kr
icio
.us
tin
y.l
y
amzn
.to
po
st.l
y
yo
utu
.be
p.t
l
tcrn
.ch
tl.g
d
ny
ti.m
s
ht.
ly
altu
rl.c
om
tin
yso
ng
.co
m
dld
.bz
su.p
r
ust
re.a
m
t.co
j.m
p
dlv
r.it
fb.m
e
twu
rl.n
l
go
o.g
l
ow
.ly
ff.i
m
pin
g.f
m
is.g
d
tin
y.c
c
bit
.ly
tin
yu
rl.c
om
mas
h.t
o
ur1
.ca
sqze
.it
cort
.as
shar
.es
4sq
.co
m
0 2
0 4
0 6
0 8
0 1
00
Med
ian
% c
ateg
ory
dri
ft
Most popular shorteners are also general-purpose and cover a wide variety of categories
#categories covered (min. 0, max. 48)
Figure 7. Frequency of change of category (median with 25- and 75-percent quantiles) and numberof categories covered (size of black dot) of the top 50 services. The most popular, general-purposeshortening services highlighted are characterized by an ample set of categories (close to 48, whichis the maximum) and short URLs that, in 32–48% of the cases, are published on pages havingcategories different from the landing page category.
Table 3. Ranking of categories by the ratio of incoming and outgoing connections via short URLs.
r Category
0.00 naturism 0.18 artnudes 0.36 weapons 0.75 shopping0.01 personalfinance 0.21 antispyware 0.36 cleaning 0.78 games0.01 do-it-yourself 0.23 drinks 0.37 dating 0.80 news0.03 pets 0.25 medical 0.39 vacation 0.82 government0.04 gardening 0.25 weather 0.40 religion 0.88 chat0.07 clothing 0.30 onlinegames 0.42 culinary 0.90 blog0.07 mail 0.32 jobsearch 0.45 filehosting 0.91 socialnetworking0.09 banking 0.33 sportnews 0.52 kidstimewasting 1.00 contraception0.12 abortion 0.33 gambling 0.55 ecommerce 1.00 childcare0.12 instantmessaging 0.36 drugs 0.67 adult 1.00 astrology0.13 jewelry 0.36 searchengines 0.68 audio-video 1.00 cellphones0.18 hacking 0.36 weapons 0.69 sports 1.00 onlineauctions
1.00 onlinepayment
landing pages. This is the case, for example, of 4sq.com (about 100%), whose shortURLs always bring from online social-networking sites to pages categorized as “other”.The most popular shortening services (e.g., bit.ly, goo.gl, ow.ly) fall into the second tier(i.e., 32–48%), together with those services that cover a wide variety of categories, andtypically interconnect pages of different categories.
Non-obvious Uses of Short URLs We also analyzed how short URLs interconnecttogether pages of different categories, to understand whether some categories have amajority of container or landing pages. To this end, we calculated the average frequencyof category change from the perspective of the container page and landing page. With thisdata we created a weighted digraph with 48 nodes, each corresponding to a category. Theweights are the frequencies of change, calculated between each pair of categories—andaveraged over all the short URLs and pages within each category; weights are between10.19 and 39.41% and distributed as shown in Fig. 8. We then calculate the averageweight of incoming, In(cat), and outgoing, Out(cat), edges for each category cat, andfinally derive the ratio r(cat) = In(cat)
In(cat)+Out(cat) . When r ! 0, the category has a majorityof outgoing short URLs (i.e., many container pages of such category), whereas r ! 1
tru
nc.
it
om
.ly
slid
esh
a.re
mo
by
.to
mig
re.m
e
wp
.me
tum
blr
.co
m
lnk
.ms
cot.
ag
flic
.kr
icio
.us
tin
y.l
y
amzn
.to
po
st.l
y
yo
utu
.be
p.t
l
tcrn
.ch
tl.g
d
ny
ti.m
s
ht.
ly
altu
rl.c
om
tin
yso
ng
.co
m
dld
.bz
su.p
r
ust
re.a
m
t.co
j.m
p
dlv
r.it
fb.m
e
twu
rl.n
l
go
o.g
l
ow
.ly
ff.i
m
pin
g.f
m
is.g
d
tin
y.c
c
bit
.ly
tin
yu
rl.c
om
mas
h.t
o
ur1
.ca
sqze
.it
cort
.as
shar
.es
4sq
.co
m
0 2
0 4
0 6
0 8
0 1
00
Med
ian
% c
ateg
ory
dri
ft
Most popular shorteners are also general-purpose and cover a wide variety of categories
#categories covered (min. 0, max. 48)
Figure 7. Frequency of change of category (median with 25- and 75-percent quantiles) and numberof categories covered (size of black dot) of the top 50 services. The most popular, general-purposeshortening services highlighted are characterized by an ample set of categories (close to 48, whichis the maximum) and short URLs that, in 32–48% of the cases, are published on pages havingcategories different from the landing page category.