Upload
ngonhan
View
214
Download
0
Embed Size (px)
Citation preview
Cloak of Visibility: Detecting When Machines Browse a Different Web
Luca Invernizzi*, Kurt Thomas*, Alexandros Kapravelos†,
Oxana Comanescu*, Jean-Michel Picod*, and Elie Bursztein*
* Google - Anti-fraud and abuse research † North Carolina State University
Web cloaking
SearchEffective for Search Engine Optimization
AdsEffective to infringe policies
MalwareEffective to evade security crawlers
Blackmarket Investigation
Acquired
Top 10Cloaking software samples
Can’t go wrong withCloaky McCloakyFace.
I swear byNowYouSeeMe!
Input keywords => http://money.site
Features
● Find similar sites through SERPs
● Content/Template spinning
● Drip-feeding
Added services
● Plagiarism detection
● SERP ranking
Admin interface
Technique: referer-based cloaking
GET /Referer: ...tiffany+cheap...
GET /Referer: blank
GET /Referer: ...tiffany...
Technique: IP blacklisting
Blacklisted IPs51m
Subnets983
Security companies30
Hacking collectives2
Proxy networks3
Entities: companies, universities, registrars
122
Technique: rDNS cloaking
66.249.66.1
Host 66.249.66.1?
crawl.googlebot.com.Google (.*1e100.*, .*google.*)
MicrosoftYahooYandexBaiduAskRamblerDirectHitTheoma
Geolocation:country, city, carrier level.
Flash/JS support & fingerprints
User-Agent
More techniques
JS
Browser farm
User-Agent: GoogleBotReferer: blankGoogle IP
Pretend Google botsUser-Agent: ChromeReferer: blank, or simpleCloud provider IPs
Simple honey clientsUser-Agent: ChromeReferer: context-awareResidential and mobile IPs
Realistic honey clients
wget wget
I’m real!
Features
Syntactic Content similarity Screenshot similarity
Semantic Topic similarity Screenshot topic similarity
HTML Image
95k labeled samples75k legitimate websites (Alexa) + 20k cloaked storefronts
Classification
False positive rate.9%
True positive rate82%
Prevalence
Cloaking pages in Google Search, for luxury storefronts keywords.
11.7%Cloaking pages in Google AdWords, for health and software ads.
4.9%
Future: client-side detection
Search/Ads links add a parameter with the topics
found by the bot. Check that the page matches the same topics.
Takeaways
Prevalence5% of ads and 12% of search results for cloaking-prone keywords cloak.
TechniquesIP/User-Agent/Referer only gets ⅕ of cloaking.
Moving forwardClient side, semantic features needed for hard cases.
Thank you!Luca Invernizzi [email protected]