Text of Cloak and Dagger. In a nutshell Cloaking Cloaking in search engines Search engines’ response...
Cloak and Dagger
In a nutshell Cloaking Cloaking in search engines Search engines response to cloaking Lifetime of cloaked search results Cloaked pages in search results
Ubiquity of advertising on the Internet. Search, by and large, enjoys the primacy. Search Engine Optimisation SEO doctoring of search results. For benign ends such as simplifying page content, optimizing load times, etc. For malicious purposes such as manipulating page ranking algorithms.
Cloaking Conceals the true nature of a Web site Keyword Stuffing Associating benign content to keywords Attracting traffic to scam pages Protecting the Web servers from being exposed Not scamming those who arrive at the site via different keywords.
Types of Cloaking Repeat Cloaking User Agent Cloaking Referrer Cloaking (sometimes also called Click- through Cloaking) IP Cloaking
DAGGER Dagger encompasses five different functions Collection of search terms Querying search results generated search engines Crawling search results Detecting cloaking Repeating the above four processes to study variance in measurements
Collection of Search Terms Two different kinds of cloaked search terms are targeted: TYPE 1 : Search terms which contain popular words. Aimed at gathering high volumes of undifferentiated traffic. TYPE 2: Search terms which reflect highly targeted traffic Here cloaked content matches the cloaked search terms.
TYPE 1 : Use popular trending search terms Google Hot Searches and terms - shed light on search engine based data collection methods, respectively Alexa - client-based data collection methods Twitter terms clue us on social networking trends. Cloaked page entirely unrelated to the trending search terms TYPE 2: set of terms catering to a specific domain Content of the cloaked pages actually matches the search terms.
Querying Search Results Terms collected in the previous step are fed to the search engines Study the prevalence of cloaking across engines Examine their response to cloaking. Top 100 search results and accompanying metadata compiled into list Known good domains entries eliminated in order to false positives during data processing. Similar entries are grouped together with appropriate count.
Crawling Search Results Crawl the URLs. Process the fetched pages Detect cloaking in parallel Helps minimize any possible time of day effects. Multiple crawls
Normal search user Googlebot Web crawler A user who does not click through the search result Detect pure user-agent cloaking without any checks on the referrer. 35% of cloaked search results for a single measurement perform pure user-agent cloaking. Pages that employ both user-agent and referrer cloaking are nearly always malicious. IP Cloaking - half of current cloaked search results do in fact employ IP cloaking via reverse DNS lookups.
Detecting Cloaking Process the crawled data using multiple iterative passes Various transformations and analyses are applied This helps compile the information needed to detect cloaking. Each pass uses a comparison based approach: Apply same transformations onto the views of the same URL, as seen from the user and the crawler Directly compare the result of the transformation using a scoring function Thresholding - detect pages that are actively cloaking and annotate them. Used for later analysis.
Temporal Re-measurement To study lifetime of cloaked pages. Temporal component in Dagger. Fetch search results from search engines Crawl and process URLs at later instances of time. Measure the rate at which search engines respond to cloaking Measure the duration pages are cloaked
Cloaking Over Time In trending searches the terms constantly change. Cloakers target many more search terms and a broad demographic of potential victims Pharmaceutical search terms are static Represent product searches in a very specific domain. Cloakers have much more time to perform SEO to raise the rank of their cloaked pages. This results in more cloaked pages in the top results.
Sources of Search Terms Blackhat SEO artificially boost the rankings of cloaked pages. Search detect cloaking either directly (analyzing pages) or indirectly (updating the ranking algorithm). Augmenting popular search terms with suggestions. Enables targeting the same semantic topic as popular search terms. Cloaking in search results highly influenced by the search terms.
Search Engine Response Search engines try to identify and thwart cloaking. Cloaked pages do regularly appear in search results,. Many are removed or suppressed by the search engines within hours to a day. Cloaked search results rapidly begin to fall out of the top 100 within the first day, with a more gradual drop thereafter.
Cloaking Duration Cloakers manage their pages similarly independent of the search engine. Pages are cloaked for long durations: over 80% remain cloaked past seven days. Cloakers will want to maximize the time that they might benefits of cloaking by attracting customers to scam sites, or victims to malware sites. Difficult to recycle a cloaked page to reuse at a later time.
Cloaked Content Redirection of users through chain of advertising networks About half of the time a cloaked search result leads to some form of abuse. long-term SEO campaigns constantly change the search terms they are targeting and the hosts they are using.
Domain Infrastructure Key resource to effectively deploy cloaking in scam: Access to Web sites Access to domains For TYPE I terms, majority of cloaked search results are in.com. For TYPE II terms, cloakers use the reputation of pages to boost their ranking in search results
Search Engine Optimization Since a major motivation for cloaking is to attract user traffic, we can extrapolate SEO performance based on the search result positions the cloaked pages occupy. Cloaking the TYPE I terms target popular terms that are very dynamic, with limited time and heavy competition for performing SEO on those search terms. Cloaking TYPE II terms is a highly focused task on a static set of terms, Provides much longer time frames for performing SEO on cloaked pages for those terms.
Conclusion Cloaking has become a standard tool in the scammers toolbox Cloaking adds significant complexity for differentiating legitimate Web content from fraudulent pages. Majority of cloaked seaarch results remain high in rankings for 12 hours The pages themselves can persist far longer. Search engine providers will need to further reduce the lifetime of cloaked results to demonetize the underlying scam activity.