Upload
lily-barton
View
212
Download
0
Embed Size (px)
Citation preview
Spam Sinkholing
Nick Feamster
Introduction
• Goal: Identify bots (and botnets) by observing second-order effects– Observe “application” behavior that’s likely to contain
bot activity (spam is a good candidate: > 85% of spam coming from bots as of 4Q 2005)
• Advantages: – Direct observation of behavior– Potentially very wide lens– Passive
• Disadvantage: No ground truth
Spam Collection Overview
• Trap mail sent to “dead” domains
• Log IPs
• Perform active and passive measurements– Traceroute– Passive SYN fingerprints– DNSBL lookups, etc.
Data Collection Overview
Mail Avenger
sendmail
Spammer
Spammer
Spammer
DNS
MX lookupsResolve to sinkhole Blowtorch (GTISC)
dynamorsync
(schema on wiki)
O(100k) pieces of spam per week
Hundreds of domains
Sample Mail Avenger Header
Highly configurable SMTP server that collects many useful statistics
Database Schema Sample
CREATE TABLE spamtrap_email ( entrytime timestamp with timezone default NULL, trap_domain text default NULL, client_ip ip4 default NULL, client_port smallint default NULL, traceroute_time timestamp with timezone default NULL, to_ text default NULL, delivered_to text default NULL, subject text default NULL, xmailer text default NULL, from_ text default NULL, emailid serial default NULL, FOREIGN KEY(dnsbl_id) on spamtrap_dnsbl(dnsbl_id),
) tablespace dataspace;
Uses for Data
• Identification: Low-confidence list of likely bot IPs
• Bootstrapping: Use as a starter set for some “intractable” analysis problems– Use this low-confidence list to prune DNSBL graph mining– Feed this information back to ISPs to focus mining
• Second-order effects– Analysis of hosting sites for URLs– Clustering
Analysis Within Spam Dataset
• Clustering to identify groups (coordination suggests likely bot)– Temporal-based correlation– Content-based correlation
• Based on URLs
• Analysis of hosting URLs: Perhaps useful for identifying phishing sites– Where hosted?– Transience?
Correlation: Across Datasets
• DNSBL datasets require bootstrapping– As per SRUTI paper– Use spam dataset as a graph pruning mechanism
• Possibility: Use spam sinkhole as a source for malware. Strip attachments.– Likely already being done by lots of others
• Get information about exfiltration email addresses and domains from binary analysis– Look for those appearing in sinkhole to build confidence and
monitor ongoing activity