Upload
michael-lamont
View
90
Download
3
Tags:
Embed Size (px)
DESCRIPTION
HP Tech Forum 2009 presentation covering some of the ways spammers harvest email addresses on the Internet (and how you can prevent it), including an in-depth look at three commonly used software packages.
Citation preview
Produced in cooperation with: HP Technology Forum & Expo 2009
© 2009 Hewlett-Packard Development Company, L.P.
The information contained herein is subject to change without notice
Email Address Harvesting Michael Lamont
Senior Software Engineer
June 17, 2009
Overview
• What is email address harvesting?
• How do spammers do it?
• What can you do about it?
• Examples of harvesting software
Mandatory Definition Slide
• Email address harvesting is the process used by spammers to extract email addresses from public sources.
• Common sources:
− Web sites
− Newsgroups
− Mailing lists
− Chat rooms
Mandatory “How Bad Is It?” Slide
• FTC: 86% of all email addresses posted on web pages receive spam.
• FTC: 93% of all email addresses used in newsgroups receive spam.
• PSC honeypot record: Address received spam 4 minutes after being included in a newsgroup post.
Address Lists
• Spammers use address harvesting to build giant lists of addresses to send spam to.
• Most lists have 1-20 million addresses.
• Spammers sell/share their lists, so being on even just one list will get you a lot of spam.
Evolution Of The Address List
• Somebody (probably not even a spammer) harvests addresses from various sources.
• A “good” harvester scrubs the list.
• The harvester sells the list to lots of spammers.
• Once your address is on a list, it’s going to be on one or more lists forever.
Harvesting From Web Sites
• Spammers usually use a spider program to scrape addresses off of web pages.
Harvesting From Web Sites
Harvesting From Web Sites
• Web directories make it easy to get lots of addresses
Harvesting From Web Sites
10 22 July 2014
UseNet Newsgroups
• Spider programs exist to extract these addresses as well.
• Email addresses are splattered all over:
− Message headers
− Signatures
− Attributions
Mailing Lists
• Lots of list manager software provides a list of every email address on a list.
• Spammers are happy to join a mailing list temporarily to get access to a list of subscribers.
• Some clever spammers send an innocuous newbie question from the list archives with a read-receipt request.
3rd Party Mailing Lists
• People you’ve provided your address to provide it to 3rd parties (usually for profit).
• Example: Auto insurance quote
• Initial sale of list might be aboveboard, but lists have a way of trickling down to less desirable senders.
Web Browser Holes
• Newer browsers have eliminated most of these, but they’re still common in older browsers.
• Extraction of email address from HTTP_FROM header that browser sends to web server.
• JavaScript to extract email address from browser’s configuration.
Web Browser Holes
• Force browser to fetch an image on a page by anonymous FTP.
− Most browsers use the configured email address as the password.
• JavaScript action that sends an email message in the background on page load.
Chat Rooms
• Web bots monitor chat rooms and extract user names.
• Lots of providers (AOL, Yahoo) use the same profile names for both chat rooms and email.
• IRC used to be fertile harvesting ground, but it’s fallen into disuse by less savvy users.
Domain Contacts
• Every registered domain name has one or more contact addresses.
• Addresses are publicly accessible (WHOIS)
• Addresses are almost always valid and read by a real person on a regular basis.
Guessing
• Spammers “guess together” a list of email addresses.
• The addresses are tested against one or more email servers.
• Valid addresses are added to a list of addresses to be spammed.
• Usually referred to as directory harvesting.
CAN-SPAM
• Federal CAN-SPAM act explicitly makes email address harvesting illegal.
• Some providers of the harvesting software have ceased and desisted, but harvesting has actually increased.
• Like most legal solutions, CAN-SPAM is severely constrained by jurisdictional boundaries.
Harvesting Prevention
• The harder it is for spammers to get your address, the harder it is for them to spam you.
• “I don’t care – my spam filter is awesome. Bring it on!”
• No filter is 100% accurate
• Filtering still places load on filtering system and/or email server.
Prevention Methods
• Reformatting addresses
• Web forms
• JavaScript-generated mailto links
• Graphical addresses
• Throwaway addresses
Reformatting Addresses
• Prevents harvesting from web pages and newsgroups.
• Simple examples include inserting bogus strings into the address to make it invalid:
Reformatting Addresses
• Writing the address out longhand can prevent harvesters from recognizing it as an email address:
jdoe at hp dot com
• Inserting extra whitespace can also help:
jdoe @ hp.com
jdoe @ hp.com
Reformatting Addresses
• ASCII-encoded characters in the address are decoded by most web clients, but not by most spamware:
jdoe@p&#
114;ocess&#
046;com
Web Forms
• Provide an HTML form for web site visitors to enter a message.
• When the form is submitted, the CGI script mails the message to the appropriate recipient.
• Avoids displaying the actual address anywhere on the site.
• Can still be abused, but it’s relatively difficult to do.
Web Forms
JavaScript Generated mailtos
• Use JavaScript to dynamically generate mailto: link when the link is clicked.
<A HREF=„javascript:window.location=
“mail”+”to:”+”jdoe”+”@”+”hp”+”.”+”com”; return
true‟>Click here to mail John Doe</A>
Graphical Addresses
• Displaying all or part of an email address as a graphical image will throw off most harvesting software.
• No known harvesting software is OCR-capable.
− Anecdotal reports of at least one large spam organization trying to develop accurate OCR harvesters
Graphical Address Complexity
• Graphical @ sign:
− Probably sufficient to throw off most harvesters.
− Username and hostname are still in close proximity.
− Works easily for multiple users/multiple domains.
jdoe hp.com
Graphical Address Complexity
• Graphical @hostname:
− Should prevent any harvester from working.
− Requires a different image for each email domain.
jdoe
Graphical Address Complexity
• Graphical everything:
− For the truly paranoid.
− Completely unreadable by harvesters unless they’re OCR-enabled.
− Requires either a lot of images or a script that can dynamically generate them.
Throwaway Addresses
• Many people create an email account that they use only for web pages and newsgroups.
• Some software products go further and let you create an alias for every occasion.
• You still need a static address for business cards, resumes, etc.
Harvesting Software
• Tons of specialized software (spamware) used by spammers to harvest addresses.
• Most spamware developed in Eastern Europe and Asia.
• We’re going to look at several of the most popular packages.
List Harvester
• Harvests addresses from web sites.
• “Targeted” harvesting - in theory, the harvested email addresses have something in common.
• Appears to be based in China.
• http://www.listharvester.com
• Price: $699 US
List Harvester - Method
• Performs a search for one or more keywords on the user’s choice of search engine.
• Parses every site returned by the search engine in order, looking for addresses and links.
• Follows links to other pages and parses them for addresses as well.
List Harvester
• Start screen:
List Harvester
• Search terms entry:
List Harvester
• Search parameters:
List Harvester
• Search filters:
List Harvester
• Parsing engine options:
List Harvester
• Saving list of extracted addresses:
List Harvester
• Harvesting in progress:
Atomic Email Hunter
• Harvests addresses from web sites.
• Either scans an entire web site for addresses or performs a “targeted search” like List Harvester.
• Based in Russia, most likely Moscow.
• http://www.massmailsoftware.com/
• Price: $79.85 US
Atomic Email Hunter
• Start screen:
Atomic Email Hunter
• Web download settings:
Atomic Email Hunter
• Address filtering settings:
Atomic Email Hunter
Run:
Atomic Email Hunter
• Results:
Fast Newsgroups Extractor
• Harvests addresses from newsgroups.
• Has a companion web site extractor that’s very similar to Atomic Email Hunter.
• Based in Russia, most likely Moscow.
• http://www.lencom.com
• Price: $79.00 US
Fast Newsgroups Extractor - Method
• Lets user select one or more newsgroups to extract content from.
• Downloads multiple messages simultaneously from the NNTP server.
• Extracts addresses from the downloaded messages.
• Has the ability to limit downloaded messages to those that contain certain text in the subject.
Fast Newsgroups Extractor
• Start screen:
Fast Newsgroups Extractor
• News server setup:
Fast Newsgroups Extractor
• Newsgroup list download:
Fast Newsgroups Extractor
• News group selection:
Fast Newsgroups Extractor
• Harvesting job setup
Fast Newsgroups Extractor
• Run:
Quick Review
• We talked about:
− What email address harvesting is
− What data sources are harvested
− How you can protect your addresses
− 3 software packages used by spammers to harvest addresses
58 22 July 2014