Upload
promptcloud
View
1.582
Download
4
Embed Size (px)
DESCRIPTION
Crawling is getting data you want from the website you want data from. But is this legal? Are there rules to this? More at PromptCloud.com
Citation preview
IS CRAWLING LEGAL?
COMMON CRAWLING QUERIES…
CAN YOU CRAWL AMAZON.COM?
GET PROFILE DATA FROM LINKEDIN?
E-COMMERCE DATA?
VITAL QUESTION
IS IT LEGAL TO GET THIS DATA?
YES & NO
NOT REALLY!
CRAWLING
IS AUTOMATED FETCHING OF WEBPAGE CONTENT
WEB CRAWLER?
CRAWLING TABOO?QUITE OFTEN USED AGAINST WEBSITE POLICIES &
BREAKS THE GROUND RULES OF CRAWLING
I. ROBOTS.TXT TELLS YOU WHAT URL CAN BE CRAWLED OR NOT!
RARELY BOT-SPECIFIC
ONLY A GUIDELINE
NOT LEGALLY ENFORCEABLE
CRAWL ONLY PUBLIC CONTENT!
COPYRIGHT MUST NOT BE NEGLECTED
II. PUBLIC CONTENT
USE IT WELL
CHECK BEFORE CRAWLING!
III. TERMS OF USE
BEFORE ACCESSING CONTENT
NO BOT POLICY. HUMANS ONLY!
IV. AUTHENTICATION REQUIRED
MAINTAIN DELAY BETWEEN CRAWLS
HIT SERVER TOO HARD, TOO FAST…
CHANCES ARE THAT YOUR IPs WILL BE BLOCKED!
V. CRAWL DELAY
WHY ALLOW CRAWLING?
•CONTENT REACHES PUBLIC• Crawling increases content discovery as long as rules are followed
•SITES HAVE TRUCKLOADS OF INFORMATION! • Bots assimilate entire site data automatically
•CRAWLING YIELDS PRECIOUS DATA• Businesses gain competitive advantage
• Data Analytics gives the edge here
VERDICT?CRAWLING ISN’T STRICTLY
‘ILLEGAL’
BE POLITE
FOLLOW THE GROUND RULES
UNLESS…YOU ASK…WHAT IS THE DATA BEING GATHERED?
WHAT IS ITS USE?
DATA CRAWLING REQUIREMENT?