Multilayer Filtering or The Dangerous Economics of Spam Control 2008 MIT Spam Conference By Alena...
16
Multilayer Filtering Multilayer Filtering or or The Dangerous Economics of Spam The Dangerous Economics of Spam Control Control 2008 MIT Spam Conference 2008 MIT Spam Conference By By Alena Kimakova and Reza Rajabiun Alena Kimakova and Reza Rajabiun York University and COMDOM Software York University and COMDOM Software Toronto, Canada Toronto, Canada
Multilayer Filtering or The Dangerous Economics of Spam Control 2008 MIT Spam Conference By Alena Kimakova and Reza Rajabiun York University and COMDOM
Multilayer Filtering or The Dangerous Economics of Spam Control
2008 MIT Spam Conference By Alena Kimakova and Reza Rajabiun York
University and COMDOM Software Toronto, Canada
Slide 2
2 I.1 Spam as an empirical problem Two historical observations
(2002-2008) A) Spam ratio in 2002 = 20% -30% of all email messages
Spam ratio in 2008 = 70% - 90% of all email messages Increased
sophistication of spam (pdf, image, search engine, etc) B)
Increased sophistication and accuracy of statistical content
filters: 98% accuracy, 0.1% false positives (Cormack and Lynam,
2007) Empirical puzzle: Why more spam after the adoption of
technical and regulatory countermeasures?
Slide 3
3 I.2 Methodology: Positive analysis How can we explain the
growth and sophistication of spam? Hypothesis: A technological
trade-off between speed and accuracy facing network owners and
operators. Approach combines: A) Game theoretical models: Large
volumes of spam because of asymmetries in the distribution of
filter quality across the Internet. B) Evolution of the
technological possibilities frontier facing ISPs and other
operators from the early 2000s. Problem with existing studies in
economics and computer science: Do not account for incentives of
spammers and ISPs. General point: Importance of interdisciplinary
cooperation between economists and computer scientists in designing
spam filtering bundles and regulatory countermeasures.
Slide 4
4 II.1 Technological Choice Advances in content filtering
accuracy Constrained sensory threat However: High noise/signal
ratio Network costs of spam rise Of particular concern in
developing countries with relatively lower: a) Bandwidth b)
Processing capacity c) Administrative capacity Spam and the Digital
Divide (Rajabiun, 2007) The literature in computer science and
economics almost exclusively focuses on false negative/positive
problem
Slide 5
5 II.2 End user and network costs More realistic assumption:
End user (E) and network costs of spam (N) are likely to be closely
linked. General problem facing an ISP (Server level problem) Costs
of Spam = C ( E (E1, E2 ), N ( E1, E2, S ) ) E1 Expected false
negative rate E2 Expected false positive rate S Number of servers
Theory: Little known about relationships, but not static. Practice:
Can be estimated for individual ISPs based on: a) Accounting
information b) Features of antispam systems available at a point in
time
Slide 6
6 II.3 Antispam technology Basic filtering methods available
since the late 1990s Server level: Adoption of (fuzzy)
fingerprinting (2001-2005) and reputation based systems (2004-2006)
upstream (fast, but not accurate) End user level: Statistical
(Bayesian) content filters (accurate, but not fast) Other technical
+ public policy measures: Aiming to increase the costs of sending
spam (Hashcash, civil/criminal law, do not call registries) Optimal
choice of filter depends on identity of end user/ISP Upstream ISPs
more sensitive to speed, downstream to accuracy Divergence between
(socially) optimal and actual technological choice
Slide 7
7 III.1 The long tail Distribution of taste for spam for each
sub-network: not normal Khong (2004): Mechanisms that connect
spammers and those with a taste for spam first best solution (open
channel argument) Blocking and filters second best Empirically:
More spam after wide-spread adoption of open channels rather than
less. Loder et al. (2004): Attention Bond Mechanism (ABM) first
best because it allows for price negotiations between senders and
receivers. Basic economic assumption: The subjective theory of
value Ex: Search for affordable drugs for the uninsured in the U.S.
The long tail in natural sciences: Phase transition/multiple
equilibria Game theory: Strategic complementarities
Slide 8
8 III.2 Sender side countermeasures In Microeconomic theory:
Long tailed distributions associated with markets where markups are
invariant to the number of sellers (e.g.: mutual funds) Margin for
spammers, or expected response rates, are invariant to the number
of spammers at play Implications: Legal sanctions and IP reputation
systems increase costs of spamming, drive some spammers out of the
market, but do not thin out the market. Intuition: As in wars
against prostitution and drugs, hang them all strategies
ineffective + increase social costs (Becker-Friedman).
Slide 9
9 IV.1 Strategic conflicts Trivial model of spam: Tragedy of
the commons Generic solution is to increase costs on spammers, but
results in escalating spam wars. Empirically: Increasing sender
costs since early 2000s, but more spam. Escalation Development and
adoption of new spamming techniques Androutsopoulos et al. (2005) 2
player game between senders and receivers has a single Nash
equilibrium Settles in infinitely repeated games, unless changes to
underlying technologies or taste for spam
Slide 10
10 IV.2 Spam growth and filter quality Reshef and Solan (2006):
Blame filters for growth of spam due to differences in filter
quality. When costs of sending messages not too high When costs of
sending messages not too high Effect of improved filter quality on
total volume of spam ambiguous Eaton et al. (2008):
Complementarities between filters and sender side countermeasures.
Improving filtering alone results in more spam. Given ineffective
sender side countermeasures, they suggest receiver side payments
(as in SMS systems). Kearns (2005): Spam as a source of both costs
and revenues for ISPs economic incentive to adopt inefficient
filters
Slide 11
11 V.1 Speed versus accuracy Existing literature: Even if they
could read end user preferences accurately, upstream backbone
providers do not have sufficient financial incentives to adopt the
right technological countermeasures. Argument here: Not necessarily
because of financial factors alone. Hypothesis: ISPs faced
technological trade-offs in terms of speed v. accuracy Coordination
failure not between senders and receivers as in the tragedy of
commons or Khong (2004), but between upstream and downstream
entities/servers. Downstream better off with less incoming spam,
but cannot force upstream to do the optimal filtering for
them.
Slide 12
12 V.2 Bundles and layers Bundles of countermeasures facing
spammers: a) Ad hoc feature selection rules (late 1990s):
centralized b) Fingerprinting/checksum filters (2000-2005):
centralized c) IP reputation/authentication mechanism (2004-2006):
centralized d) Statistical content filters (since late 1990s):
distributed Asymmetric filter quality (2000-2006): (b and c) fast
relative to 1 st generation statistical content filters (5x), but
less accurate (-5% and -30% respectively). Response by income
smoothing spammers: higher noise/signal ratio, more variants, one
shot BGP spectrum agility
Slide 13
13 VI.1 The response A) Coordination by operators to strengthen
authentication protocols (SPF, DKIM) Problem: A wide range of
techniques available to bypass, and even use the protocols as an
instrument of sending more spam! B) Closing the gap between fast
and accurate filters: Further optimization of the methods for
distributed content scanning, learning, and classification 1 st
versus 2 nd generation Bayesian filters (CRM114, Bogofilter,
COMDOM)
Slide 14
14 VI.2 Evolution of Bayesian content filters
Slide 15
15 VI.3 Findings Technological trade-off between speed and
accuracy now closed with distributed 2 nd generation Bayesian
filters (at least 30x differential in throughput relative to 1 st
generation): Note: Fingerprinting was 5x faster than 1 st
generation Bayesian filters in terms of throughput Fixed versus
variable costs of message processing Substantial reductions in
variable costs of scanning, minor improvements in fixed costs of
classification
Slide 16
16 VII. Summary More spam is an instrument for: a) Evading
filters b) Searching for people with a taste for spam Normative
question for policy makers: Should spamming be illegal? Legal
sanctions may induce moral hazard problem and potentially
exacerbate the problem at the aggregate level by adopting more
costly strategies/technologies (especially important for developing
countries). For designers of antispam systems/bundles: Should we
retain layers that aim to increase the costs of spamming through ad
hoc centralized control (e.g. IP reputation, fingerprinting)?