21
Cloak and Dagger

Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages

Embed Size (px)

Citation preview

Page 1: Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages

Cloak and Dagger

Page 2: Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages

In a nutshell…• Cloaking

• Cloaking in search engines

• Search engines’ response to cloaking

• Lifetime of cloaked search results

• Cloaked pages in search results

Page 3: Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages

• Ubiquity of advertising on the Internet.

• Search, by and large, enjoys the primacy.

• Search Engine Optimisation – SEO – doctoring of search results.

• For benign ends such as simplifying page content, optimizing load times, etc.

• For malicious purposes such as manipulating page ranking algorithms.

Page 4: Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages

Cloaking• Conceals the true nature of a Web site

• Keyword Stuffing – Associating benign content to keywords

• Attracting traffic to scam pages

• Protecting the Web servers from being exposed

• Not scamming those who arrive at the site via different keywords.

Page 5: Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages

Types of Cloaking

• Repeat Cloaking

• User Agent Cloaking

• Referrer Cloaking (sometimes also called “Click-through Cloaking”)

• IP Cloaking

Page 6: Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages

DAGGER

Dagger encompasses five different functions –

• Collection of search terms

• Querying search results generated search engines

• Crawling search results

• Detecting cloaking

• Repeating the above four processes to study variance in measurements

Page 7: Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages

Collection of Search Terms

Two different kinds of cloaked search terms are targeted:

• TYPE 1 : Search terms which contain popular words.

• Aimed at gathering high volumes of undifferentiated traffic.

• TYPE 2: Search terms which reflect highly targeted traffic

• Here cloaked content matches the cloaked search terms.

Page 8: Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages

• TYPE 1 : Use popular trending search terms

• Google Hot Searches and terms - shed light on search engine based data collection methods, respectively

• Alexa - client-based data collection methods

• Twitter terms clue us on social networking trends.

• Cloaked page entirely unrelated to the trending search terms

• TYPE 2: set of terms catering to a specific domain

• Content of the cloaked pages actually matches the search terms.

Page 9: Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages

Querying Search Results • Terms collected in the previous step are fed to the search

engines

• Study the prevalence of cloaking across engines

• Examine their response to cloaking.

• Top 100 search results and accompanying metadata compiled into list

• “Known good” domains entries eliminated in order to false positives during data processing.

• Similar entries are grouped together with appropriate ‘count’.

Page 10: Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages

Crawling Search Results

• Crawl the URL’s.

• Process the fetched pages

• Detect cloaking in parallel

• Helps minimize any possible time of day effects.

• Multiple crawls

Page 11: Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages

• Normal search user

• Googlebot Web crawler

• A user who does not click through the search result

• Detect pure user-agent cloaking without any checks on the referrer.

• 35% of cloaked search results for a single measurement perform pure user-agent cloaking.

• Pages that employ both user-agent and referrer cloaking are nearly always malicious.

• IP Cloaking - half of current cloaked search results do in fact employ IP cloaking via reverse DNS lookups.

Page 12: Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages

Detecting Cloaking • Process the crawled data using multiple iterative passes

• Various transformations and analyses are applied

• This helps compile the information needed to detect cloaking.

• Each pass uses a comparison based approach:

• Apply same transformations onto the views of the same URL, as seen from the user and the crawler

• Directly compare the result of the transformation using a scoring function

• Thresholding - detect pages that are actively cloaking and annotate them.

• Used for later analysis.

Page 13: Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages

Temporal Re-measurement • To study lifetime of cloaked pages.

• Temporal component in Dagger.

• Fetch search results from search engines

• Crawl and process URLs at later instances of time.

• Measure the rate at which search engines respond to cloaking

• Measure the duration pages are cloaked

Page 14: Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages

Cloaking Over Time

• In trending searches the terms constantly change.

• Cloakers target many more search terms and a broad demographic of potential victims

• Pharmaceutical search terms are static

• Represent product searches in a very specific domain.

• Cloakers have much more time to perform SEO to raise the rank of their cloaked pages.

• This results in more cloaked pages in the top results.

Page 15: Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages

Sources of Search Terms

• Blackhat SEO – artificially boost the rankings of cloaked pages.

• Search detect cloaking either directly (analyzing pages) or indirectly (updating the ranking algorithm).

• Augmenting popular search terms with suggestions.

• Enables targeting the same semantic topic as popular search terms.

• Cloaking in search results highly influenced by the search terms.

Page 16: Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages

Search Engine Response • Search engines try to identify and thwart cloaking.

• Cloaked pages do regularly appear in search results,.

• Many are removed or suppressed by the search engines within hours to a day.

• Cloaked search results rapidly begin to fall out of the top 100 within the first day, with a more gradual drop thereafter.

Page 17: Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages

Cloaking Duration • Cloakers manage their pages similarly independent of

the search engine.

• Pages are cloaked for long durations: over 80% remain cloaked past seven days.

• Cloakers will want to maximize the time that they might benefits of cloaking by attracting customers to scam sites, or victims to malware sites.

• Difficult to recycle a cloaked page to reuse at a later time.

Page 18: Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages

Cloaked Content • Redirection of users through chain of advertising

networks

• About half of the time a cloaked search result leads to some form of abuse.

• long-term SEO campaigns constantly change the search terms they are targeting and the hosts they are using.

Page 19: Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages

Domain Infrastructure • Key resource to effectively deploy cloaking in scam:

• Access to Web sites

• Access to domains

• For TYPE I terms, majority of cloaked search results are in .com.

• For TYPE II terms, cloakers use the “reputation” of pages to boost their ranking in search results

Page 20: Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages

Search Engine Optimization• Since a major motivation for cloaking is to attract user

traffic, we can extrapolate SEO performance based on the search result positions the cloaked pages occupy.

• Cloaking the TYPE I terms target popular terms that are very dynamic, with limited time and heavy competition for performing SEO on those search terms.

• Cloaking TYPE II terms is a highly focused task on a static set of terms,

• Provides much longer time frames for performing SEO on cloaked pages for those terms.

Page 21: Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages

Conclusion• Cloaking has become a standard tool in the

scammer’s toolbox

• Cloaking adds significant complexity for differentiating legitimate Web content from fraudulent pages.

• Majority of cloaked seaarch results remain high in rankings for 12 hours

• The pages themselves can persist far longer.

• Search engine providers will need to further reduce the lifetime of cloaked results to demonetize the underlying scam activity.