43
Adversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics Purdue University With Murat Kantarcioglu

Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Adversarial Machine Learning:Big Data Meets Cyber Security

Bowei XiDepartment of Statistics

Purdue University

With Murat Kantarcioglu

Page 2: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

IntroductionMany adversarial learning problems in practice.

— Image Classification/Object Recognition

— Intrusion Detection

— Fraud Detection

— Spam Detection

— Social Networks

Adversary adapts to avoid being detected.

New solutions are needed to address this challenge.

Page 3: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Million ways to write a spam email

From: "Ezra Martens" <ezrabngktbbem...

To: "Eleftheria Marconi" <clifton@pu...

Subject: shunless Phaxrrmaceutical

Date: Fri, 30 Sep 2005 04:49:10 -0500

Hello,

Easy Fast =

Best Home Total

OrdeShipPrricDelivConf

ringpingeseryidentiality

VIAAmbCIALevVALXan

GRAienLISitraIUMax

$ $ $$

3.33 1.21 3.75

Get =additional informmation attempted to

Page 4: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Understanding Adversarial Learning

It is not concept drift.

It is not online learning.

Adversary changes the objects under its control to avoid being de-

tected.

This is a game between the defender and the adversary.

Page 5: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Solution Ideas

Constantly update a learning algorithm

— Dalvi et. al., Adversarial classification, KDD 2004

Optimize worse case performance

— Globerson and Roweis, Nightmare at test time: robust learning

by feature selection, ICML 2006

Goals

— How to evaluate a defensive learning algorithm?

— How to construct a resilient defensive learning algorithm?

Page 6: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Game Theoretic Framework

Often a classifier is updated after observing adversaries actions, such

as spam filters.

Adversarial Stackelberg Game

Two players take sequential actions.

1. Adversary moves first by choosing a transformation T , fb(x) −→fTb (x)

2. After observing T , defender sets parameter values for a learning

algorithm and creates a defensive rule h(x).

f(x) = pgfg(x) + pbfTb (x). pg + pb = 1. Overall it is a mixture

distribution with two classes.

Page 7: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Game Theoretic Framework

— Adversary’s payoff is defined as

ub(T, h) =∫L(h,g)

g(T, x)fTb (x) dx.

g(T, x) is profit for a “bad” object x being classified as a “good”

one, after transformation T being applied. There is a penalty for

transformation. L(h, g) is the region where the objects are identified

as legitimate under decision rule h(x).

— Let C(T, h) be the misclassification cost. Defender’s payoff is

ug(T, h) = −C(T, h).

— hT (x) is defender’s best response facing T .

Page 8: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Game Theoretic Framework

— The adversary gain of applying transformation T is W (T ) =

ub(T, hT ).

W (T ) =∫L(hT ,g)

g(T, x)fTb (x)dx = EfTb

(I{L(hT ,g)}(x)g(T, x)).

– (T e, hT e) is a subgame perfect equilibrium, s.t.,

T e = argmaxT∈S ( W (T ) ) .

— After an equilibrium is reached, each player has no incentive to

change their action.

Page 9: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Game Theoretic Framework

Attribute selection is important in adversarial learning.

Example: Gaussian mixture distribution, Naive Bayes classifier, and

linear penalty for transformation.

πg πb Penalty Equi. Bayes ErrorX1 N(1,1) N(3,1) a = 1 0.16X2 N(1,1) N(3.5,1) a = 0.45 0.13X3 N(1,1) N(4,1) a = 0 0.23

Page 10: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Game Theoretic Framework

— Discretize the continuous numerical variables.

— Use the product of one dimensional marginal distributions to ap-

proximate the joint distribution.

— Search for an equilibrium using linear programming.

Case study: Lending Club Data

Peer to peer financing, offer small loans ranging from $1000 to

$25000.

Except for a credit report, many fields are easy to lie, such as job

information, purpose of the loan etc.

Page 11: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Game Theoretic Framework

We use the following attributes from the data set.

X1 – Amount requested

X2 – Loan purpose

X3 – Debt-to-income ratio

X4 – Home ownership (any, none, rent, own, mortgage)

X5 – Monthly income

X6 – FICO range

X7 – Open credit lines

X8 – Total credit lines

X9 – Revolving credit balance

X10 – Revolving line utilization

X11 – Inquiries in the last 6 months

X12 – Accounts now delinquent

X13 – Delinquent amount

X14 – Delinquencies in last 2 years

X15 – Months since last delinquency

Page 12: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Game Theoretic Framework

— The instances originally are classified as:

1) “Removed”; 2) “Loan is being issued”; 3) ”Late (31-120 days)”;

4) “Late (16-30 days)”; 5) “Issued”; 6) “In review”; 7) “In grace

period”; 8) “In funding”; 9) “Fully paid”; 10) “Expired”; 11) “De-

fault”; 12) “Current”; 13) “Charged Off”; 14) “New”.

— 5850 instances after data cleaning.

— pg = pb = 0.5.

— Numerical attributes are discretized using 10 equi-width buckets.

— Naive Bayes classifier with independent attributes.

— Attacker transforms each attribute independently.

Page 13: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Game Theoretic Framework

max

− m∑i=1

m∑j=1

fij × dij

+

(m∑i=1

I{yi<qi} × yi × ui

)subject to

fij ≥ 0 i = 1, . . . ,m, j = 1, . . . ,m (1)m∑j=1

fij = pi i = 1, ...,m (2)

m∑i=1

fij = yj j = 1, . . . ,m (3)

m∑j=1

yj = 1 (4)

pi is the prob. of x = wi given that instance is in bad class.qi.is the prob. of x = wi given that instance is in bad class.

fij is the proportion moved from wi to wj by the adversary.

dij is the penalty of transforming X from wi to wj.

ui is the expected profit.

yi is the prob. after transformation.

Page 14: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Game Theoretic Framework

— The worst case scenario has transformation cost 0.

— For penalized case, we assume improving an attribute for one level

costs $500 (results do not change in $300-$600 range).

— Attribute selection (choose top 6 attributes according to one dim

classification accuracy after transformation) works well.

— Equilibrium performance is a good indicator of the long term

success of a learning algorithm.

— Penalty prevents aggressive attacks.Experiment Type AccuracyInitial data set prior to transformation 89.95%Worst case scenario 61.25%Penalized transformation scenario 72.28%Attribute selection scenario 70.92%

Page 15: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Leader vs. Follower

How defender being a leader versus being a follower affects its equi-

librium strategies?

One defender with utility D(h).

m adversaries, each with utility Ai(ti).

h and ti are player strategies .

Page 16: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Leader vs. Follower

Defender being the leader

A one-leader-m-follower game.

1. Given a leader’s strategy h fixed, assume the m adversaries’ (i.e.,

the followers’) strategies are the attacks T = (t1, · · · , tm). For the

i-th adversary, further assume all other adversaries’ strategies are

fixed, i.e., fixed tj, ∀j 6= i. Solve the following optimization for thi :

thi = argmax{ti∈Si} {Ai(ti, h)}

2. With the solution from above, Th = (th1, · · · , thm) is the m ad-

versaries’ joint optimal attacks for a given defender strategy h, the

defender solves another optimization problem.

he = argmax{h∈H} {D(th1, · · · , thm, h)}

(he, the

1 , · · · , them) is an equilibrium strategy for all players in the game.

Page 17: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Leader vs. Follower

Defender being the follower

A m-leader-one-follower game.

1. Given the joint attacks from the m adversaries, T = (t1, · · · , tm),

solve for the defender’s optimal strategy.

hT = argmax{h∈H} {D(t1, · · · , tm, h)}

2.With the solution above as the defender’s optimal strategy hT

against joint attacks T = (t1, · · · , tm), solve for the optimal joint

attacks T e.

T e = (te1, ..., tem) = argmax{ti∈Si,∀i}

m∑i=1

Ai(ti, hT )

(hTe, te1, · · · , t

em) is an equilibrium strategy for all players in the game.

Page 18: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Leader vs. Follower

Gaussian Mixture Population

The defender controls the normal population Np(µg,Σg).

Each adversary controls a population Np(µbi ,Σbi).

For an attack t, adversarial objects move toward µg, the center of

the normal. Xt = µg + (1− t)× (X − µg).

0 ≤ t ≤ 1. t = 1 is the strongest attack.

Adversary population after attack is N((1− t)µb + tµg, (1− t)2Σb).

m adversaries’ strategy is the joint attack (t1, · · · , tm).

Defender’s strategy is to build a defensive wall around the center µg.

Page 19: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Leader vs. Follower

An Euclidean defensive wall is an ellipsoid shaped defensive region

(X − µ̂g)′Sg,−1(X − µ̂g) = χ2p(α).

0 < α < 1 controls the size.

It is an approximate confidence region for multivariate normal.

A Manhattan defensive wall forms a diamond shaped region.

p∑i=1

|Xi − µ̂gi |

σ̂gi

= η,

σ̂gi is the sample standard deviation. µ̂gi is the sample mean.

η controls the size.

Page 20: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Leader vs. Follower

Obtain η(α) as a function of α.

Using the sample mean vector µ̂g and var-cov matrix Sg from the

normal objects, generate a large sample from N(µ̂g, Sg). For an η(α),

(100α)% of the generated sample points fall into the diamond shaped

Manhattan region with vertices at µ̂gi ± η(α)σ̂gi on each dimension.

A defender’s strategy is to choose h = α or h = η(α).

Defender utility is D(α) or D(η(α)). Let c be misclassification cost.

D(α) = −100(error-of-normal + c× error-of-adversary).

L(t) = E(max{k − a× ||Xt −X||2,0}

), an adversary utility. k is max

utility of an adversarial object. ||Xt − X||2 measures how much an

adversary object is moved towards normal.

Do exhaustive search to find the equilibria of the games.

Page 21: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Leader vs. Follower

Euclidean defensive wall. Left is defender being the leader; right is

defender being the follower.

Page 22: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Leader vs. Follower

Manhattan defensive wall. Left is defender being the leader; right is

defender being the follower.

Page 23: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Adversarial Clustering

Previous adversarial learning work assumes the availability of large

amounts of labeled instances.

We develop a novel grid based adversarial clustering algorithm using

mostly unlabeled objects with a few labeled instances.

A classifier created using very few labeled objects is inaccurate.

We identify 1) the centers of normal objects, 2) sub-clusters of attack

objects, 3) the overlapping areas, 4) outliers and unknown clusters.

We draw defensive walls around the centers of the normal objects to

block out attack objects. The size of a defensive wall is based on

the previous game theoretic study.

Page 24: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Adversarial Clustering

It is not semi-supervised learning.

We operate under very different assumptions.

In adversarial settings, objects similar to each other belong to dif-

ferent classes, while objects in different clusters belong to the same

class.

Within a cluster, objects from two classes can overlap significantly.

Adversaries can bridge the gap between two previously well separated

clusters.

Semi-supervised learning assigns labels to all the unlabeled objects

aiming at the best accuracy.

We do not label the overlapping regions and outliers.

Page 25: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Adversarial Clustering

We have nearly purely normal objects inside the defensive walls, de-

spite an increased Bayes error.

Like in airport security. A small number of passengers use the fast

pre-check lane, while all other passengers go through security check.

The goal is not to let a single terrorist enter an airport, at a cost of

blocking out many normal objects.

A classification boundary is a defensive wall, analogous to a point es-

timate, while the overlapping areas analogous to confidence regions.

Page 26: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Adversarial Clustering

A Grid Based Adversarial Clustering Algorithm

—Initialization: Set parameter values

—Merge 1: Creating labeled normal and abnormal sub-clusters

—Merge 2: Clustering the remaining unlabeled data points

—Merge 3: Merge all the data points without considering the labels

—Match: Match the big unlabeled clusters with the normal and ab-

normal clusters, and the clusters of the remaining unlabeled data

points from the first pass.

—Draw α-level defensive walls inside the normal regions

A weight parameter k controls the size of the overlapping areas.

Page 27: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Adversarial Clustering

Compare with semi-supervised learning, EM least squares and S4VM.

α = 0.6. True labels below. Blue is for normal; orange is for abnor-

mal; purple for unlabeled; yellow and black for outliers.

−2

−1

0

1

2

−2 −1 0 1 2

−2

0

2

4

0 1 2 3 4

Page 28: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Adversarial Clustering

−2

−1

0

1

2

−2 −1 0 1 2

−2

−1

0

1

2

−2 −1 0 1 2

Exp.1. Left is ADClust with k = 10; right is ADClust with k = 20.

Page 29: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Adversarial Clustering

−2

−1

0

1

2

−2 −1 0 1 2

−2

−1

0

1

2

−2 −1 0 1 2

Exp.1. Left is EM least square; right is S4VM

Page 30: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Adversarial Clustering

−2

0

2

4

0 1 2 3 4

−2

0

2

4

0 1 2 3 4

Exp.2. Left is ADClust with k = 10; right is ADClust with k = 20

Page 31: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Adversarial Clustering

−2

0

2

4

0 1 2 3 4

−2

0

2

4

0 1 2 3 4

Exp.2. Left is EM least square; right is S4VM

Page 32: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Adversarial Clustering

Experiment 1 has an attack taking place between two normal clusters.

Experiment 2 has an unknown cluster, with normal and abnormal

clusters heavily mixed under a strong attack.

Our algorithm found the core areas of the normal.

Semi-supervised wrongly labeled a whole normal cluster in experiment

1, and labeled the unknown cluster as normal in experiment 2.

Smaller k is more conservative, creating larger unlabeled mixed areas.

Page 33: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Adversarial Clustering

KDD Cup 1999 Network Intrusion Data

Around 40 percent are network intrusion instances.

We use 25192 instances from training data and top 7 continuous

features.

Experiment 1 has 100 runs. In one run, 150 instances are randomly

sampled with their labels. 99.4% become unlabeled.

Experiment 2 has 100 runs. In one run, 100 instances are randomly

sampled with their labels. 99.6% become unlabeled.

Larger k produces less unlabeled points, bigger normal regions con-

taining more attack instances. More aggressive.

Page 34: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Adversarial Clustering

0 20 40 60 80 1000.7400.7420.7440.7460.7480.750

0 20 40 60 80 1000.73

0.74

0.75

0 20 40 60 80 100500

520

540

560

0 20 40 60 80 1003500

4000

4500

5000

Exp.1. Increase the weight k from 1 to 100. Top left is percent of abnormal

points in mixed region;bottom left is percent of abnormal points among outliers;

top right is the number of points in mixed region; and bottom right is the number

of points as outliers.

Page 35: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Adversarial Clustering

��� �� �� ��� ���� ������

��

���

���

���

���

��

����

����

�����

��� �� �� ��� ���� ������

��

���

���

���

���

��

����

����

������

��� �� �� ��� ���� ������

��

���

���

���

���

��

����

����

������

������������������������

��� �� �� ��� ���� ������

��

���

���

���

���

��

����

����

����

��� �� �� ��� ���� ������

���

���

���

���

��

����

����

����

��

��� �� �� ��� ���� ������

���

���

���

���

��

����

����

����

��������������������������

Exp.2. k equals to 1, 30 and 50 as low, medium and high weights. Set α levels

from 0.6 to 0.95. KDD data is highly mixed, yet we achieve on average nearly

90% pure normal rate inside the defensive walls

Page 36: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Adversarial SVM

Classification boundaries. + is for the untransformed “bad” objects; o is for the

“good” objects; * is for the transformed “bad” objects, i.e., the attack objects.

The black dashed line is the standard SVM classification boundary, and the blue

line is the Adversarial SVM classification boundary. Both the untransformed and

the transformed “bad” objects are what we want to detect and block.

Page 37: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Adversarial SVM

AD-SVM solves a convex optimization problem where the constraints

are tied to adversarial attack models.

Free-range attack: Adversary can move attack objects anywhere

in the domain.

Cf(xmin.j − xij) ≤ δij ≤ Cf(xmax.j − xij)

The jth feature of an instance xi falls in between xmax.j and xmin.j .

Cf ∈ [0,1]. Cf = 0 means no attack. Cf = 1 is the most aggressive

attack.

Page 38: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Adversarial SVM

Targeted attack: Adversary can only move attack instances closer

to a targeted value.

xti is the target. The adversary adds δij to the j−th feature xij of an

attack object. |δij| ≤ |xtij − xij|.

An upper bound on the amount of movement for xij:

0 ≤ (xtij − xij)δij ≤ Cξ

1− Cδ|xtij − xij||xij|+ |xtij|

(xtij − xij)2

Cδ reflects the loss of malicious utility. It can be larger than 1.

1 − Cδ|xtij−xij||xij|+|xtij|

is the maximum percentage of xtij − xij that δij is

allowed to be.

Cξ ∈ [0,1] is a discount factor. Larger Cξ allows more movement.

Page 39: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Adversarial SVM

SVM risk minimization model: free-range attack

argminw,b,ξi,ti,ui,vi12||w||

2 + C∑i ξi

s.t. ξi ≥ 0ξi ≥ 1− yi · (w · xi + b) + titi ≥

∑j Cf

(vij(x

maxj − xij)− uij(xminj − xij)

)ui − vi = 1

2(1 + yi)wui � 0vi � 0

SVM risk minimization model: targeted attack

argminw,b,ξi,ti,ui,vi12||w||

2 + C∑i ξi

s.t. ξi ≥ 0ξi ≥ 1− yi · (w · xi + b) + titi ≥

∑j eijuij

(−ui + vi) ◦ (xti − xi) = 12(1 + yi)w

ui � 0vi � 0

Page 40: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Adversarial SVM

spambase data is from the UCI data repository. Accuracy of AD-

SVM, SVM, and one-class SVM on the spambase data as free range

attacks intensify. Cf increases as attacks become more aggressive.

Generate attacks as δij = fattack(xtij − xij)

fattack = 0 fattack = 0.3 fattack = 0.5 fattack = 0.7 fattack = 1.0AD-SVM Cf = 0.1 0.882 0.852 0.817 0.757 0.593AD-SVM Cf = 0.3 0.880 0.864 0.833 0.772 0.588AD-SVM Cf = 0.5 0.870 0.860 0.836 0.804 0.591AD-SVM Cf = 0.7 0.859 0.847 0.841 0.814 0.592AD-SVM Cf = 0.9 0.824 0.829 0.815 0.802 0.598

SVM 0.881 0.809 0.742 0.680 0.586One-Class SVM 0.695 0.686 0.667 0.653 0.572

Page 41: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Adversarial SVM

Cδ decreases as targeted attacks become more aggressive. Cξ = 1

fattack = 0 fattack = 0.3 fattack = 0.5 fattack = 0.7 fattack = 1.0AD-SVM Cδ = 0.9 0.874 0.821 0.766 0.720 0.579AD-SVM Cδ = 0.7 0.888 0.860 0.821 0.776 0.581AD-SVM Cδ = 0.5 0.874 0.860 0.849 0.804 0.586AD-SVM Cδ = 0.3 0.867 0.855 0.845 0.809 0.590AD-SVM Cδ = 0.1 0.836 0.840 0.839 0.815 0.597

SVM 0.884 0.812 0.761 0.686 0.591One-class SVM 0.695 0.687 0.676 0.653 0.574

AD-SVM is more resilient to modest attacks than other SVM learning

algorithms.

Page 42: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

Funding Support

— ARO W911NF-17-1-0356: Data Analytics for Cyber Security: Defeating the

Active Adversaries, UT Dallas PI: M. Kantarcioglu, Purdue PI: B. Xi, $470,000,

08/07/2017 – 08/06/2020, Amount Responsible: $222,500

— ARO W911NF-12-1-0558: A Game Theoretic Framework for Adversarial Clas-

sification, UT Dallas PI: M. Kantarcioglu, Purdue PI: B. Xi, $440,000, 08/01/2012

– 07/31/2015, Amount Responsible: $210,140

Publications

— Kantarcioglu, M., Xi, B., and Clifton, C., Classifier Evaluation and Attribute

Selection against Active Adversaries, Data Mining and Knowledge Discovery. 22(1-

2), 291-335, 2011

— Zhou, Y., Kantarcioglu, M., Thuraisingham, B., and Xi, B., Adversarial Support

Vector Machine Learning, Proceedings of the 18th ACM SIGKDD International

Conference on Knowledge Discovery and Data Mining, 1059–1067, KDD 2012

Page 43: Adversarial Machine Learning: Big Data Meets Cyber Securityxbw/talk/advl-2018-talk.pdfAdversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics

— Kantarcioglu, M.∗, and Xi, B.∗, Adversarial Data Mining, North Atlantic Treaty

Organization (NATO) SAS-106 Symposium on Analysis Support to Decision Mak-

ing in Cyber Defense, 1–11, June, 2014, Estonia

— Zhou, Y., Kantarcioglu, M., Xi, B., (Hot Topic Essay) Adversarial Learning:

Mind Games with the Opponent, ACM Computing Reviews, August 2017

— Zhou, Y., Kantarcioglu, M., Xi, B., A Survey of Game Theoretic Approach for

Adversarial Machine Learning, revision submitted

— Wei, W., Xi, B., Kantarcioglu, M., Adversarial Clustering: A Grid Based Clus-

tering Algorithm against Active Adversaries, submitted

— Zhou, Y., Kantarcioglu, M., Xi, B., Adversarial Active Learning, submitted