41
Web Scrapers and Your Property Portal: High Risk Lessons

17 00 distil rami

Embed Size (px)

Citation preview

Page 1: 17 00 distil rami

Web Scrapers and Your Property Portal: High Risk Lessons

Katherine Oberhofer
Questions to answer regarding presentation: - Is it a presentation or an interview- Is there anyone presenting with you- Do you have any unique requirements (e.g. video / audio)
Page 2: 17 00 distil rami

Speaker

Rami EssaidCEODistil Networks

Page 3: 17 00 distil rami

Awards and Analyst Recognition

“Distil’s ability to analyze behavior provides the best chance of detecting

and blocking bot-driven attacks.”

5 Stars across the board.“Verdict: For monitoring the impact of bots on a network this is the tool one

needs.”

The only anti-bot solution to be included in Gartner’s Online Fraud

Detection Market Guide

Ovum puts Distil Networks On The Radar. “Clear innovation compared to

similar services.”

Page 4: 17 00 distil rami

Fortune 500 & Alexa Global 10,000 CustomersEcommerce

Travel

Publishers

Directories

Traditional Media

Marketplace

Services

Page 5: 17 00 distil rami

Distil Protects Over 50 RE Portals Globally!

Page 6: 17 00 distil rami

Protecting Your Data

Enhancing Your Data

Cleaning-Up Your Data

Page 7: 17 00 distil rami

Protecting Your Data

A Brief Intro to Bots and Web Scraping

Page 8: 17 00 distil rami

What Is Web Scraping?

Web ScrapingAlso known as screen scraping, web scraping is the act of copying large amounts of data from a website – either manually or with an automated program.

Legitimate ScrapingScraping can sometimes be benevolent and totally acceptable. For example, the search engine bots that index your website

Malicious ScrapingA systematic theft of intellectual property accessible on a website, including pricing, content, images, and proprietary data

Page 9: 17 00 distil rami

Who is behind Web Scraping?

CompetitorsContent Theft

Competitive IntelPrice Scraping

AggregatorsStart-ups

Unauthorized Middlemen

HackersContent for Fake Pages

Search EnginesGoogle

BingYahooBaidu

Page 10: 17 00 distil rami

Bad Bots Cause the Majority of Website Problems

Page 11: 17 00 distil rami

In 2015 the most targeted verticals were digital publishing and real estate. Real Estate sites saw a 300% increase in

bad bot traffic!

Traffic by Type of Site, 2014 vs 2015

Page 12: 17 00 distil rami

Bad Web Scraping

Web scraping is the act of taking content from a website with the intent of using it for purposes outside the direct control of the site owner.

It can be used to○ Steal intellectual property○ Gain competitive advantage○ Create aggregation or meta-sites○ Perform market research○ Damage SEO rankings

Page 13: 17 00 distil rami

Alexa – monitor traffic levels

SE Ranking – track search rankings

InfiniGraph – watch social media trends

Open Site Explorer – monitor backlinks

SpyFu – view advertising keywords

Page 14: 17 00 distil rami

Moat – find where ads are running

iSpionage – organic search keywords

Compete PRO – get demographic info

Quantcast – view audience insights

SpyOnWeb – see behind the curtain

Page 15: 17 00 distil rami

Cheap scraping software

Inexpensive cloud computing resources

Botnet-as-a-Service

What is Contributing to the Growth in Web Scraping?

Page 16: 17 00 distil rami

Freelancer.com RatesScraping three real estate sitesData Manipulation (de-duping, etc.)Importing into new software

Average Cost - $130 USD

The Going Rate for Scraping Less than $130/day

Page 17: 17 00 distil rami

Posting Stolen Data is Quick and Easy due to Turnkey Platforms

Real Estate Portal Platforms start at $299

Page 18: 17 00 distil rami

Scraped Data$130

The Cost of Replicating your Website

Classified Ad Website$299

$429

Page 19: 17 00 distil rami

Bottom LineScrapers scrape because they are making money with your listings!

And the Real Estate industry is left with...

Higher CostsLost Revenues

Why Bots / Scraping is a Problem in Real Estate

Page 20: 17 00 distil rami

Case Study

Page 21: 17 00 distil rami

Enhancing Your Data

Page 22: 17 00 distil rami

Delivering a Clear Picture of Your Web Traffic

Low Resolution Fingerprint

“Unactionable”

Hi-Def Fingerprint“Actionable”

Page 23: 17 00 distil rami

Hi-Def Fingerprinting Eliminates Blind Defense

IP AddressHeader & User Agent InformationCookie Browser

200+ Attributes of data Navigator, WebGL, Plugins, Audio, Video, etc.

Tamper proofing layer

Hi-Def Fingerprint

Page 24: 17 00 distil rami

That Majority of Bad Bots Now Use Multiple IP Addresses

Bots which dynamically rotate IP addresses, or distribute attacks are significantly harder to detect and mitigate

Page 25: 17 00 distil rami

Sticky Bot Tracking With No Impact On Real UsersDevice FingerprintingFingerprints stick to the bot even if it attempts to reconnect from random IP addresses or hide behind an anonymous proxy or peer-to-peer network

Tracks distributed attacks that would normally fly under the radar

Without Distil

With Distil

Without Impacting Users Sharing the Same IPAvoids blocking residential users or organizations that might share the same NAT as the bot or botnet

Page 26: 17 00 distil rami

Case Study

Page 27: 17 00 distil rami

Cleaning-Up Your Data

Page 28: 17 00 distil rami

In 2015 the most targeted verticals were digital publishing and real estate. Real Estate sites saw a 300% increase in

bad bot traffic!

Traffic by Type of Site, 2014 vs 2015

Page 29: 17 00 distil rami

Web scraping hurts your KPIs...Slowdowns, downtime, and poor user experiencesIncrease in costs (infrastructure and people)Distortion of web analyticsDigital ad fraud, reputation and trust (bad leads)

How Web Scrapers Impact KPIs

Page 30: 17 00 distil rami

Majority of Bots are Advanced Persistent Bots (APBs)

APBs have one or more of the following abilities:

AdvancedMimick human behaviorLoad JavaScriptLoad external resourcesSupport cookiesBrowser automation (Selenium, PhantomJS)

Persistent Dynamic IP rotationDistribute attacks across IP addressesHide behind anonymous and peer-to-peer proxies 2016 Distil Bad Bot

Report

Page 31: 17 00 distil rami

Loading Assets & Bots Mimicking Humans % of bots able to load external assets (e.g.

JavaScript) % of bots able to mimic

human behavior

These bots will skew marketing tools such as (Google Analytics, A/B testing,

conversion tracking, etc.)These bots will fly under the radar of

most security tools

Page 32: 17 00 distil rami

Bots Throw Off Analytics

Page 33: 17 00 distil rami

Impressions and Clicks Remain the Biggest Targets

Impressions(CPM/CPV)

Clicks(CPC)

Search$18.8B

86% digital spend

Display$7.9B

Video$3.5B Mobile

$6.2B$6.2B

Leads(CPL)

Sales(CPA)

Lead Gen$2.0B

Other$5.0B

• classifieds• sponsorship• rich media

estimated fraudnot at risk

$42.5B $7B

Page 34: 17 00 distil rami

Bots Don't Buy Houses

Page 35: 17 00 distil rami

35

Page 36: 17 00 distil rami

Case Study

Page 37: 17 00 distil rami

The Only Easy and Accurate Way to Protect Web Applications from Bad Bots, API Abuse, and Fraud.

Page 38: 17 00 distil rami

Detect and Distil Traffic

Page 39: 17 00 distil rami

No Longer Blind DefenseComplete Visibility into False Positives

17 million CAPTCHAs served

78 solved

False Positive Rate = 0.00000458

Page 40: 17 00 distil rami

www.distilnetworks.com/trial/Offer Ends: October 30, 2016

Two Months of Free Service + Traffic Analysis

Page 41: 17 00 distil rami

www.distilnetworks.com

QUESTIONS….COMMENTS?I N F O @ D I S T I L N E T W O R K S . C O M

1.866.423.0606OR CALL US ON