IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. , NO

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. , NO. , SEPTEMBER 2021 1

A Comprehensive Evaluation Framework for DeepModel Robustness

Aishan Liu, Xianglong Liu†, Jun Guo, Jiakai Wang, Yuqing Ma, Ze Zhao,Xinghai Gao, and Gang Xiao

Abstract—Deep neural networks (DNNs) have achieved re-markable performance across a wide area of applications.However, they are vulnerable to adversarial examples, whichmotivates the adversarial defense. By adopting simple evaluationmetrics, most of the current defenses only conduct incompleteevaluations, which are far from providing comprehensive un-derstandings of the limitations of these defenses. Thus, mostproposed defenses are quickly shown to be attacked successfully,which result in the “arm race” phenomenon between attackand defense. To mitigate this problem, we establish a modelrobustness evaluation framework containing a comprehensive,rigorous, and coherent set of evaluation metrics, which couldfully evaluate model robustness and provide deep insights intobuilding robust models. With 23 evaluation metrics in total,our framework primarily focuses on the two key factors ofadversarial learning (i.e., data and model). Through neuroncoverage and data imperceptibility, we use data-oriented metricsto measure the integrity of test examples; by delving into modelstructure and behavior, we exploit model-oriented metrics tofurther evaluate robustness in the adversarial setting. To fullydemonstrate the effectiveness of our framework, we conductlarge-scale experiments on multiple datasets including CIFAR-10 and SVHN using different models and defenses with ouropen-source platform AISafety1. Overall, our paper aims toprovide a comprehensive evaluation framework which coulddemonstrate detailed inspections of the model robustness, andwe hope that our paper can inspire further improvement to themodel robustness.

Index Terms—Adversarial examples, evaluation metrics, modelrobustness.

I. INTRODUCTION

DEEP learning models have achieved remarkable per-formance across a wide area of applications [1]–[3],

however they are susceptible to adversarial examples [4].These elaborately designed perturbations are imperceptible tohumans but can easily lead DNNs to wrong predictions, threat-ening both digital and physical deep learning applications [5]–[7].

Since deep learning has been integrated into varioussecurity-sensitive applications (e.g., auto-driving, healthcare,etc.), the safety problem brought by adversarial examples

A. Liu, X. Liu, J. Guo, J. Wang, Y. Ma, and Z. Zhao are with the StateKey Lab of Software Development Environment, Beihang University, Beijing100191, China. X. Liu is also with Beijing Advanced Innovation Center forBig Data-Based Precision Medicine († Corresponding author: Xianglong Liu,[email protected])

X. Gao is with the Institute of Unmanned System, Beihang University,Beijing 100191, China.

G. Xiao is with the National Key Laboratory for Complex SystemsSimulation, Beijing, China.

1https://git.openi.org.cn/OpenI/AISafety

has attracted extensive attentions from the perspectives ofboth adversarial attack (generate adversarial examples to mis-classify DNNs) and defense (build a model that is robustto adversarial examples) [5], [8]–[19]. To improve modelrobustness against adversarial examples, a long line of ad-versarial defense methods have been proposed, e.g., defensivedistillation [20], input transformation [21]. However, most ofcurrent adversarial defenses conduct incomplete evaluations,which are far from providing comprehensive understandingsfor the limitations of these defenses. Thus, these defenses arequickly shown to be attacked successfully, which results inthe “arm race” phenomenon between attack and defense [22]–[25]. For example, by evaluating on simple white-box attacks,most adversarial defenses pose a false sense of robustness byintroducing gradient masking, which can be easily circum-vented and defeated [25]. Therefore, it is of great significanceand challenge to conduct rigorous and extensive evaluationon adversarial robustness for navigating the research field andfurther facilitating trustworthy deep learning in practice.

To rigorously evaluate the adversarial robustness for DNNs,a number of works have been adopted [26], [27]. However,most of these works focus on providing practical advices orbenchmarks for model robustness evaluation, which ignorethe significance of evaluation metrics. By adopting the simpleevaluation metrics (e.g., attack success rate, classification accu-racy), most of current studies could only use model outputs toconduct incomplete evaluation. For instance, the classificationaccuracy against an attack under specific perturbation magni-tude is reported as the primary and commonly used evaluationmetric, which is far from satisfactory to measure model intrin-sic behavior in adversarial setting. Therefore, the incompleteevaluation cannot provide comprehensive understandings ofthe strengths and limitations of these defenses.

In this work, with a hope to facilitate future research, weestablish a model robustness evaluation framework containinga comprehensive, rigorous, and coherent set of evaluationmetrics. These metrics could fully evaluate model robustnessand provide deep insights into building robust models. Thispaper focuses on the robustness of deep learning models on themost commonly studied image classification tasks with respectto `p-norm bounded adversaries and some other corruption.As illustrated in Figure 1, our evaluation framework canbe roughly divided into two parts: data-oriented and model-oriented, which focus on the two key factors of adversariallearning (i.e., data and model). Since model robustness isevaluated based on a set of perturbed examples, we firstuse data-oriented metrics regarding neuron coverage and data

arX

iv:2

101.

0961

7v1

[cs

.CV

] 2

4 Ja

n 20

21

https://git.openi.org.cn/OpenI/AISafety


Boundary-based

• EBD• EBD-2

Neuron-based• Neuron

Sensitivity• Neuron

Uncertainty

Task

• CA

• AAW• AAB• ACTC• NTE

Adversarial

• mCE• RmCE• mFR

Corruption

• CAV• CRR/CSR• CCV• COS

Defense

Model-oriented Metrics

Model BehaviorConsistency-based

• ENI

Model Structure

DNN

Data-oriented Metrics

Data

Neuron Coverage

• KMNCov• NBCov• SNACov

• 𝐴𝐿𝐷𝑝• ASS• PSD

Data Imperceptibility

Fig. 1. With 23 evaluation metrics in total, our comprehensive evaluation framework primarily focuses on the two key factors of adversarial learning (i.e.,data and model). Through neuron coverage and data imperceptibility, we use data-oriented metrics to measure the integrity of test examples; by delving intomodel structure and behavior, we exploit model-oriented metrics to further evaluate robustness in the adversarial setting.

imperceptibility to measure the integrity of test examples (i.e.,whether the conducted evaluation covers most of the neuronswithin a model); meanwhile, we focus on evaluating modelrobustness via model-oriented metrics which consider bothmodel structures and behaviors in the adversarial setting (e.g.,decision boundary, model neuron, corruption performance,etc.). Our framework contains 23 evaluation metrics in total.

To fully demonstrate the effectiveness of the evaluationframework, we then conduct large scale experiments on mul-tiple datasets (i.e., CIFAR-10 and SVHN) using differentmodels with different adversarial defense strategies. Throughthe experimental results, we could conclude that: (1) thoughshowing high performance on some simple and intuitivemetrics such as adversarial accuracy, some defenses are weakon more rigorous and insightful metrics; (2) besides `∞-norm adversarial examples, more diversified attacks shouldbe performed to conduct comprehensive evaluations (e.g.,corruption attacks, `2 adversarial attacks, etc.); (3) apart frommodel robustness evaluation, the proposed metrics shed lighton the model robustness and are also beneficial to the designof adversarial attacks and defenses. All evaluation experimentsare conducted on our new adversarial robustness evaluationplatform referred as AISafety, which could fully support ourcomprehensive evaluation. We hope our platform could facili-tate follow researchers for better understanding of adversarialexamples as well as further improvement of model robustness.

Our contributions can be summarized as follows:

• We establish a comprehensive evaluation framework formodel robustness containing 32 metrics, which couldfully evaluate model robustness and provide deep insightsinto building robust models;

• Based on our framework, we provide an open-sourcedplatform named AISafety, which supports continuous

integration of user-specific algorithms and language-independent models;

• We conduct large-scale experiments using AISafety, andwe provide preliminary suggestions to the evaluation ofmodel robustness as well as the design of adversarialattacks/defenses in the future.

The structure of the paper is illustrated as follows: Section IIintroduces the related works; Section III defines and providesthe detail of our evaluation metrics; Section IV demonstratesthe experiments; Section VI introduces our open-sourced plat-form; Section V provides some additional discussions and sug-gestions; and Section VII summarizes the whole contributionsand provides the conclusion.

II. RELATED WORK

In this section, we provide a brief overview of existing workon adversarial attacks and defenses, as well as adversarialrobustness evaluation works.

A. Adversarial attacks and defenses

Adversarial examples are inputs intentionally designed tomislead DNNs [4], [8]. Given a DNN f and an input imagex ∈ X with the ground truth label y ∈ Y, an adversarialexample xadv satisfies

f(xadv) 6= y s.t. ‖x− xadv‖ ≤ ε,

where ‖ · ‖ is a distance metric. Commonly, ‖ · ‖ is measuredby the `p-norm (p ∈{1,2,∞}).

In the past years, great efforts have been devoted to gen-erating adversarial examples in different scenarios and tasks[7]–[10], [12], [23], [28]. Adversarial attacks can be dividedinto two types: white-box attacks, in which adversaries have


the complete knowledge of the target model and can fullyaccess the model; black-box attacks, in which adversaries havelimited knowledge of the target classifier and can not directlyaccess the model. Specifically, most white-box attacks craftadversarial examples based on the input gradient, e.g., thefast gradient sign method (FGSM) [8], the projected gradientdescent method (PGD) [13], the Carlini & Wagner method(C&W) [23], Deepfool [29] , etc. For the black-box methods,they can be roughly divided into transfer-based attacks [10],score-based attacks [30]–[32], and decision-based attacks [33].

Meanwhile, to improve model robustness against adversarialexamples, various defense approaches have been proposed,including defensive distillation [20], input transformation [14],[21], [34], robust training, [13], [17], and certified defense[35]–[37]. Among the adversarial defenses, adversarial train-ing has been widely studied and demonstrated to be the mosteffective [8], [13]. Specifically, adversarial training minimizesthe worst case loss within some perturbation region for clas-sifiers, by augmenting the training set {x(i),y(i)}i=1...n withadversarial examples as follows:

minθ

∑i

maxδ`(f(xi + δ),yi),

where δ is bounded within `p perturbations with radius ε, and`(·) represents the loss function.

Besides, corruption such as snow and blur also frequentlyoccur in the real world, which also presents critical challengesfor the building of robust deep learning models. Supposing,we have a set of corruption functions C in which eachc(x) performs a different kind of corruption function. Thus,average-case model performance on small, general, classifier-agnostic corruption can be used to define model corruptionrobustness as follows:

Ec∼C [P(x,y)∼D(f(c(x)) = y)].

A concerning fact is that most proposed defenses conductincomplete or incorrect evaluations, which are quickly shownto be attacked successfully due to limited understanding ofthese defenses [22]–[25]. Consequently, conducting rigorousand comprehensive evaluation on model robustness becomesparticularly important.

B. Model robustness evaluation

To comprehensively evaluate the model robustness forDNNs, a number of works have been proposed [26], [27], [38],[39]. A uniform platform for adversarial robustness analysisnamed DEEPSEC [38] is proposed to measure the vulnerabil-ity of deep learning models. Specifically, the platform incorpo-rates 16 adversarial attacks with 10 attack utility metrics, and13 adversarial defenses with 5 defensive utility metrics. Unlikeprior works, [26] discussed the methodological foundations,reviewed commonly accepted best practices, and suggestednew methods for evaluating defenses to adversarial examples.In particular, they provided principles for performing defenseevaluations and a specific checklist for avoiding commonevaluation pitfalls. Moreover, [39] proposed a set of multi-granularity metrics for deep learning systems, which aims at

rendering a multi-faceted portrayal of the testbed (i.e., testingcoverage). More recently, [27] established a comprehensivebenchmark to evaluate adversarial robustness on image clas-sification tasks. They incorporated several adversarial attackand defense methods for robustness evaluation, including 15attack methods, 16 defense methods, and 2 evaluation metrics.

However, these studies mainly focus on establishing opensource libraries for adversarial attacks and defenses, whichfail to provide a comprehensive evaluation considering severalaspects of a deep learning model towards different noises.

III. EVALUATION METRICS

To mitigate the problem brought by incomplete evalua-tion, we establish a multi-view model robustness evaluationframework which consists of 23 evaluation metrics in total.As shown in Table I, our evaluation metrics can be roughlydivided into two parts: data-oriented and model-oriented. Wewill illustrate them in this section.

A. Data-Oriented Evaluation Metrics

Since model robustness is evaluated based on a set of per-turbed examples, the quality of the test data plays a critical rolein robustness evaluation. Thus, we use data-oriented metricsconsidering both neuron coverage and data imperceptibility tomeasure the integrity of test examples.

For traditional software engineering, researchers design andseek a series of representative test data from the whole largeinput space to detect the software bugs. Test adequacy (oftenquantified by coverage criteria) is a key factor to measurewhether the software has been comprehensively tested [46].Inspired by that, DeepGauge [39] introduced the coveragecriteria into neural networks and proposed Neuron Coverageto leverage the output values of neuron and its correspondingboundaries obtained from training data to approximate themajor function region and corner-case region at the neuron-level.

1) Neuron Coverage: We first use the coverage criteriafor DNNs to measure whether the generated testset (e.g.,adversarial examples) could cover enough amount of neuronswithin a model.k-Multisection Neuron Coverage (KMNCov). Given a neu-

ron n, the KMNCov measures how thoroughly the given setof the test inputs D covers the range of neuron output value[lown, highn], where D = {x1, x2, ...} is a set of input data.Specifically, we divide the range [lown, highn] into k sectionswith the same size (k > 0), and Sn

i denotes the i-th sectionwhere 1 ≤ i ≤ k. Let φ(x,n) denote a function that returnsthe output of a neuron n under a given input sample x ∈ D.We use φ(x,n) ∈ Sn

i to denote that the i-th section of neuronn is covered by the input x. For a given test set D and aspecific neuron n, the corresponding k-Multisection NeuronCoverage is defined as the ratio of the sections covered by Dand the overall sections. It can be written as

KMNCov(D, k) =

∑n∈N |{Sn

i |∃x ∈ D : φ(x,n) ∈ Sni }

k × |N |,

(1)


TABLE ITHE TAXONOMY AND ILLUSTRATION OF THE PROPOSED EVALUATION METRICS.

METRICS BEHAVIOR STRUCTUREADVERSARIAL

ATTACKSCORRUPTION

ATTACKSWHITEBOX BLACKBOX

SINGLEMODEL

MULTIPLEMODELS

DATA

KMNCOV [39] X X X XNBCOV [39] X X X X

SNACOV [39] X X X XALDp [38] X X XASS [40] X X XPSD [41] X X X

MODEL

CA X X XAAW X X X XAAB X X X X

ACAC [42] X X X XACTC [42] X X X XNTE [41] X X X XMCE [43] X X X X

RMCE [43] X X X XMFR [43] X X X XCAV [38] X X X X X

CRR/CSR [38] X X X X XCCV [38] X X X X XCOS [38] X X X X XEBD [44] X X X X X

EBD-2 X X X XENI [44] X X X X X

NEURON SENSITIVITY [45] X X X XNEURON UNCERTAINTY X X X X X

where N = {n1,n2, ...} is a set of neurons for the model. Itshould be noticed that for a neuron n and input x, if φ(x,n) ∈[lown, highn] is satisfied with ∀n ∈ N , we say that this DNNis located in its major function region. Otherwise, it is locatedin the corner-case region.

Neuron Boundary Coverage (NBCov). Neuron BoundaryCoverage measures how many corner-case regions have beencovered by the given test input set D. Given an input x ∈ D,a DNN is located in its corner-case region when given x,∃n ∈ N : φ(x,n) ∈ (−∞, lown) ∪ (highn,∞). Thus, theNBCov can be defined as the ratio of the covered corner casesand the total corner cased (2× ‖N‖):

|UpperCornerNeuron|+ |LowerCornerNeuron|2× |N |

, (2)

where the UpperCornerNeuron is the set consisting ofneurons that satisfy ∃x ∈ D : φ(x, n) ∈ (highn,+∞). Andthe LowerCornerNeuron is the set of neurons that satisfy∃x ∈ D : φ(x,n) ∈ (−∞, lown).

Strong Neuron Activation Coverage (SNACov). This metricis designed to measure the coverage status of upper-cornercase (i.e., how many corner cases have been covered by thegiven test sets). It can be described as the ratio of the coveredupper-corner cases and the total corner cases (|N |):

SNACov(D) =|UpperCornerNeuron|

|N |. (3)

2) Data Imperceptibility: In adversarial learning literature,the visual imperceptibility of the generated perturbation is oneof the key factor that influences model robustness. Thus, weintroduce several metrics to evaluate data visual impercepti-bility by considering the magnitude of perturbations.

Average `p Distortion (ALDp). Most adversarial attacks gen-erate adversarial examples by constructing additive `p-normadversarial perturbations (e.g., p ∈ 0, 1, ...,∞). To measure

the visual perceptibility of generated adversarial examples, weuse ALDp as the average normalized `p distortion:

ALDp =1

m

m∑i=1

||x(i)adv − x(i)||p||x(i)||p

, (4)

where m denotes the number of adversarial examples, andthe smaller ALDp is, the more imperceptible the adversarialexample is.

Average Structural Similarity (ASS). To evaluate the imper-ceptibility of adversarial examples, we further use SSIM whichis considered to be effective to measure human visual percep-tion. Thus, ASS can be defined as the average SSIM similaritybetween all adversarial examples and the corresponding cleanexamples, i.e.,

ASS =1

m

m∑i=1

SSIM(x(i)adv,x

(i)), (5)

where m denotes the number of successful adversarial ex-amples, and the higher ASS is, the more imperceptible theadversarial example is.

Perturbation Sensitivity Distance (PSD). Based on the con-trast masking theory [47], [48], PSD is proposed to evaluatehuman perception of perturbations. Thus, PSD is defined as:

PSD =1

n

n∑i=1

t∑i=1

δ(i)(j)Sen(R(x

(i)(j))), (6)

where t is the total number of pixels, δ(i)(j) represents the j-th

pixel of the i-th example. R(x(i)(j)) stands for the square sur-

rounding region of x(i)(j), and Sen(R(x

(i)(j))) = 1/std(R(x

(i)(j))).

Evidently, the smaller PSD is, the more imperceptible theadversarial example is.


B. Model-oriented Evaluation Metrics

To evaluate the model robustness, the most intuitive direc-tion is to measure the model performance in the adversarialsetting. Given an adversary Aε,p, it uses specific attack methodto generates adversarial examples xadv = Aε,p(x) for a cleanexample x with the perturbation magnitude ε under `p norm.

In particular, we aim to analyze and evaluate model robust-ness from both dynamic and static views (i.e., model behaviorsand structures). By inspecting model outputs towards noises,we can directly measure model robustness through studyingthe behaviors; meanwhile, by investigating model structures,we can provide more detailed insights into model robustness.

1) Model Behaviors: We first summarize evaluation met-rics in terms of model behaviors as follows.• Task PerformanceClean Accuracy (CA). Model accuracy on clean examples

is one of the most important properties in the adversarialsetting. A classifier achieving high accuracy against adversarialexamples but low accuracy on clean examples still cannot beemployed in practice. CA is defined as the percentage of cleanexamples that are successfully classified by a classifier f intothe ground truth classes. Formally, CA can be calculated asfollows

CA(f,D) =1

n

n∑i=1

1(f(xi) = yi), (7)

where D = {x(i),y(i)}i=1...n is the test set, 1(·) is theindicator function.• Adversarial PerformanceAdversarial Accuracy on White-box Attacks (AAW). In the

untargeted attack scenario, AAW is defined as the percentageof adversarial examples generated in the white-box setting thatare successfully misclassified into an arbitrary class except forthe ground truth class; for targeted attack, it can be measuredby the percentage of adversarial examples generated in thewhite-box setting classified as the target class. In the rest ofthe paper, we mainly focus on untargeted attacks. Thus, AAWcan be defined as:

AAW (f,D,Aε,p) =1

n

n∑i=1

1(f(Aε,p(xi)) 6= yi). (8)

Adversarial Accuracy on Black-box Attacks (AAB). Similarto AAW, AAB is defined by the percentage of adversarialexamples classified correctly by the classifier. By contrast, theadversarial examples are generated by black-box or gradient-free attacks.

Average Confidence of Adversarial Class (ACAC). Besidesmodel prediction accuracy, prediction confidence on adversar-ial examples gives further indications of model robustness.Thus, for an adversarial example, ACAC can be defined asthe average prediction confidence towards the incorrect class

ACAC(f,D,Aε,p) =1

m

m∑i=1

Pyi(Aε,p(xi)), (9)

where m is the number of adversarial examples that attacksuccessfully, Pyi is the prediction confidence of classifier ftowards the ground truth class yi.

Average Confidence of True Class (ACTC). In additionto ACAC, we also use ACTC to further evaluate to whatextent the attacks escape from the ground truth. In otherwords, ACTC can be defined as the average model predictionconfidence on adversarial examples towards the ground truthlabels, i.e.,

ACTC(f,D,Aε,p) =1

n

n∑i=1

Pyi(Aε,p(xi)). (10)

Noise Tolerance Estimation (NTE). Moreover, given thegenerated adversarial examples, we further calculate the gapbetween the probability of misclassified class and the maxprobability of all other classes as follows

NTE(f,D,Aε,p) =1

n

n∑i=1

[Pf(xi)(Aε,p(xi))−

maxPj(Aε,p(xi))],(11)

where j ∈ 1, ..., k and j 6= f(xi).• Corruption PerformanceTo further comprehensively measure the model robustness

against different corruption, we introduce evaluation metricsfollowing [43].

mCE. This metric denotes the mean corruption error of amodel compared to the baseline model [43]. Different fromthe original paper, we simply calculate the error rate of theclassifier f on each corruption type c at each level of severitys denoted as Efs,c and compute mCE as follows:

CEfc =

t∑s=1

Efs,c, (12)

where t denotes the number of severity levels. Thus, mCE isthe average value of Corruption Errors (CE) using differentcorruption.

Relative mCE. A more nuanced corruption robustness mea-sure is Relative mCE (RmCE) [43]. If a classifier withstandsmost corruption, the gap between mCE and the clean dataerror is minuscule. So, RmCE is calculated as follows:

RmCEfc =

t∑s=1

Efs,c − Efclean, (13)

where Efclean is the error rate of f on clean examples.mFR. Hendrycks et al. [43] introduce mFR to represent

the classification differences between two adjacent frames inthe noise sequence for a specific image. Let us denote q

noise sequences with S = {(x(1)i ,x

(2)i , ...,x

(n)i )}qi=1 where

each sequence is created with specific noise type c. The ‘FlipProbability’ of network f is

FP fc =1

q(n− 1)

q∑i=1

sumnj=21(f(x

(j)i ) 6= f(x

(j−1)i )).

(14)Then, the Flip Rate (FR) can be obtained by FRfc =FP fc /FP

basec and mFR is the average value of FR.

• Defense Performance


In addition to the basic metrics, we further try to explore towhat extent the model performance has been influenced whendefense strategies are added to the model.

CAV. Classification Accuracy Variance (CAV) is used toevaluate the impact of defenses based on the accuracy. Weexpect the defense-enhanced model fd to maintain the clas-sification accuracy on normal testing examples as much aspossible. Therefore, it is defined as follows:

CAV = ACC(fd, D)−ACC(f,D), (15)

where ACC(f,D) denotes model f accuracy on dataset D.CRR/CSR. CRR is the percentage of testing examples that

are misclassified by f previously but correctly classified byfd. Inversely, CSR is the percentage of testing examples thatare correctly classified by f but misclassified by fd. Thus,they are defined as follows:

CRR =1

n

n∑i=1

count((f(xi) 6= yi)&(fd(xi) = yi)), (16)

CSR =1

n

n∑i=1

count((f(xi) = yi)&(fd(xi) 6= yi)), (17)

where n is the number of examples.CCV. Defense strategies may not have negative influences on

the accuracy performance, however, the prediction confidenceof correctly classified examples may decrease. ClassificationConfidence Variance (CCV) can measure the confidence vari-ance induced by robust models:

CCV =1

n

n∑i=1

|Pyi(xi)− P dyi(xi)|, (18)

where Pyi(xi) denotes the prediction confidence of modelf towards yi and n is the number of examples correctlyclassified by both f and fd.

COS. Classification Output Stability (COS) uses JS diver-gence to measure the similarity of the classification outputstability between the original model and the robust model.It averages the JS divergence on all correctly classified testexamples:

COS =1

n

n∑i=1

JSD(P (xi)‖P d(xi)), (19)

where P (xi) and P d(xi) denotes the prediction confidenceof model f and fd on xi, respectively. n is the number ofexamples correctly classified by both f and fd.

2) Model Structures: We further provide evaluation met-rics with respect to model structures as follows.• Boundary-basedEmpirical Boundary Distance (EBD). The minimum dis-

tance to the decision boundary among data points reflects themodel robustness to small noise [49], [50]. EBD calculatesthe minimum distance to the model decision boundary in aheuristic way. A larger EBD value means a stronger model.Given a learnt model f and point xi with class label yi(i = 1, . . . , k), it first generates a set V of m randomorthogonal directions [51]. Then, for each direction in V it

estimates the root mean square (RMS) distances φi(V ) to thedecision boundary of f , until the model’s prediction changes,i.e., f(xi) 6= yi. Among φi(V ), di denotes the minimumdistance moved to change the prediction for instance xi. Then,the Empirical Boundary Distance is defined as follows:

EBD =1

n

n∑i=1

di, di = minφi(V ), (20)

where n denotes the number of instances used.Empirical Boundary Distance-2 (EBD-2). Additionally, we

introduce the evaluation metrics EBD-2, which calculatesthe minimum distance of the model decision boundary foreach class. Given a learnt model f and dataset D ={x(i),y(i)}i=1...n, for each direction j in the k classes, themetric estimates the distances dj to change the model predic-tion of x(i), i.e., f(x(i)) 6= y(i). Specifically, we use iterativeadversarial attacks (e.g., BIM) in practice and calculate thesteps used as the distance dj .• Consistency-basedε-Empirical Noise Insensitivity. [52] first introduced the

concept of learning algorithms robustness from the idea that iftwo samples are “similar” then their test errors are very close.ε-Empirical Noise Insensitivity measures the model robustnessagainst noise from the view of Lipschitz constant, and alower value indicates a stronger model. We first select n cleanexamples randomly, then m examples are generated from eachclean example via various methods, e.g., adversarial attack,Gaussian noise, blur, etc. The differences between model lossfunction are computed when clean example and correspondingpolluted examples are fed to. The different severities in lossfunction is used to measure model insensitivity and stabilityto generalized small noise within constraint ε:

If (ε) =1

n×m

n∑i=1

m∑j=1

|lf (xi|yi)− lf (µij |yi)||xi − µij |∞

s.t. |xi − µij |∞ ≤ ε,(21)

where xi, µij and yi denote the clean example, correspondingpolluted example and the class label, respectively. Moreover,lf (·|·) represents the loss function of model f .• Neuron-basedNeuron Sensitivity. Intuitively, for a model that owns strong

robustness, namely, insensitive to adversarial examples, theclean example x and the corresponding adversarial example x′

share a similar representation in the hidden layers of the model[52]. Neuron Sensitivity can be deemed as the deviation of thefeature representation in hidden layers between clean exam-ples and corresponding adversarial examples, which measuresmodel robustness from the perspective of neuron. Specifically,given a benign example xi, where i = 1, . . . ,n , from D andits corresponding adversarial example x′i from D′, we can getthe dual pair set D = {(xi,x′i)}, and then calculate the neuronsensitivity σ as follows:

σ(fl,m, D) =1

n

n∑i=1

1

dim(fl,m(xi))‖fl,m(xi)− fl,m(x′i)‖1,

(22)


where fl,m(xi) and fml (x′i) respectively represents outputs ofthe m-th neuron at the l-th layer of f towards clean example xiand corresponding adversarial example x′i during the forwardprocess. dim(·) denotes the dimension of a vector.

Neuron Uncertainty. Model uncertainty has been widelyinvestigated in safety critical applications to induce the con-fidence and uncertainty behaviors during model prediction.Motivated by the fact that model uncertainty is commonlyinduced by predictive variance, we use the variance of neuronfl,m to calculate the Neuron Uncertainty as:

U(fl,m) =1

n

n∑i=1

variance(fl,m((x(i)))). (23)

IV. EXPERIMENTS

In this section, we evaluate model robustness using ourproposed evaluation framework. We conduct experiments onimage classification benchmarks CIFAR-10 and SVHN.

A. Experiment Setup

Architecture and hyperparameters. We us WRN-28-10[53] for CIFAR-10; and VGG-16 [54] for SVHN. For fair com-parisons, we keep the architecture and main hyper-parametersthe same for all the baselines on each dataset.

Adversarial attacks. To evaluate the model robustness, wefollow existing guidelines [13], [26] and incorporate multipleadversarial attacks for different perturbation types. Specifi-cally, we adopt PGD attack [13], C&W attack [23], boundaryattack (BA) [33], SPSA [32], and NATTACK [31]. We set theperturbation magnitude for `1 attacks as 12 on CIFAR-10 andSVHN; we set the perturbation magnitude for `2 attacks as 0.5on CIFAR-10 and SVHN; we set the perturbation magnitudefor `∞ attacks as 0.03 on CIFAR-10 and SVHN. Note that,to check whether obfuscated gradient has been introduced,we adopt both the white-box and the black-box or gradient-free adversarial attacks. A more complete details of all attacksincluding hyper-parameters can be found in the SupplementaryMaterial.

Corruption attacks. To assess the corruption robustness,we evaluate models on CIFAR-10-C and CIFAR-10-P [43].These two datasets are the first choice for benchmarking modelstatic and dynamic model robustness against different commoncorruption and noise sequences at different levels of severity[43]. They are created from the test set of CIFAR-10 using 75different corruption techniques (e.g., Gaussian noise, Possionnoise, pixelation, etc.). For SVHN, we use the code providedby [43] to generate the corrupted examples.

Adversarial defenses. We use several state-of-the-art ad-versarial defense methods including TRADES [55], standardadversarial training (SAT) [8], PGD adversarial training (PAT)[13], and Rand [21]. A detailed description of the implemen-tations can be found in the Supplementary Material.

B. Model-oriented Evaluation

We first evaluate model robustness by measuring the model-oriented evaluation metrics.

1) Model Behaviors: We first evaluate model robustnesswith respect to behaviors.

As for adversarial robustness, we report metrics includingCA, AAW, AAB, ACAC, ACTC, and NTE. The experimentalresults regarding CA and AAW can be found in Table II; theresults of AAB are shown in Table III; and the results in termsof ACAC, ACTC, and NTE are listed in Figure 2. Besidesstandard black-box attacks (NATTACK, SPSA, and BA), wealso generate adversarial examples using an Inception-V3 thenperform transfer attacks on the target model (denoted “PGD-`1”, “PGD-`2”, and “PGD-`∞” in Table III).

As for corruption robustness, the results of mCE, relativemCE, and mFR can be found in Figure 3. Moreover, the resultsof CAV, CRR/CSR, CCV, and COS are illustrated in Table IV.

From the above experimental results, we can draw severalconclusions as follows: (1) TRADES achieves the highestadversarial robustness for almost all adversarial attacks in bothblack-box and white-box settings, however it is vulnerableagainst corruptions; (2) models trained on one specific per-turbation type are vulnerable to other norm-bounded pertur-bations (e.g., `∞ trained models are weak towards `1 and `2adversarial examples); and (3) according to Figure 2(b) and2(d), standard adversarially-trained models (SAT and PAT) arestill vulnerable from a more rigorous perspective by showinghigh confidence of adversarial classes and low confidence oftrue classes.

TABLE IIWHITE-BOX ADVERSARIAL ATTACKS (%) ON CIFAR-10 USING

WIDERESNET-28 AND ON SVHN USING VGG-16.

(a) CIFAR-10

MODEL CLEAN PGD-`1 PGD-`2 PGD-`∞ C&WVANILLA 93.4 10.0 0.0 0.0 0.0

PAT 81.4 10.0 0.1 37.7 50.7SAT 87.0 10.2 0.0 26.2 34.4

TRADES 85.8 10.4 0.0 47.7 57.1

(b) SVHN

MODEL CLEAN PGD-`1 PGD-`2 PGD-`∞ C&WVANILLA 94.7 8.9 3.4 3.6 13.9

PAT 92.7 7.1 0.6 44.9 46.1SAT 94.8 8.4 0.5 10.6 8.8

TRADES 89.5 6.0 0.6 51.8 44.1

2) Model Structures: We then evaluate model robustnesswith respect to structures. The results of EBD and EBD-2 are illustrated in Table V and Fig 5; the results of ε-Empirical Noise Insensitivity, Neuron Sensitivity, and NeuronUncertainty can be found in Figure 6, 7, respectively.

In summary, we can draw several interesting observations:(1) in most cases, models with higher adversarial accuracyare showing better structure robustness; (2) though showingthe highest adversarial accuracy, TRADES does not have thelargest EBD value as shown in Table V.

C. Data-oriented Evaluation

We then report the data-oriented evaluation metrics. Foreach dataset, given test set with 10000 images randomly


TABLE IIIBLACK-BOX ADVERSARIAL ATTACKS (%) ON CIFAR-10 USING

WIDERESNET-28 AND ON SVHN USING VGG-16.

(a) CIFAR-10

NATTACK SPSA BA PGD-`1 PGD-`2 PGD-`∞VANILLA 0.2 21.3 2.6 10.0 6.5 69.1

PAT 38.0 73.5 60.8 10.0 30.2 81.1SAT 27.7 80.8 62.4 9.9 13.1 86.3

TRADES 46.7 76.6 64.1 9.5 26.4 79.6

(b) SVHN

NATTACK SPSA BA PGD-`1 PGD-`2 PGD-`∞VANILLA 34.5 37.4 25.4 12.1 31.7 79.1

PAT 45.3 71.2 59.1 15.4 39.8 89.5SAT 15.8 78.5 21.5 17.4 39.3 90.3

TRADES 52.4 68.2 54.7 12.1 37.9 87.1

TABLE IVEXPERIMENTS RESULTS (%) OF CAV, CRR, CSR, CCV, AND COS ONCIFAR-10 USING WIDERESNET-28 AND ON SVHN USING VGG-16.

(a) CIFAR-10

MODEL CAV CRR CSR CCV COSPAT -12.0 1.7 13.7 5.7 2.6NAT -6.4 3.1 9.5 1.7 1.1

TRADES -13.6 2.4 16.0 19.9 8.2RAND -0.7 1.3 2.0 0.2 0.4

(b) SVHN

MODEL CAV CRR CSR CCV COSPAT -2.0 2.4 4.4 11.9 5.0NAT 0.1 2.2 2.1 0.1 0.2

TRADES -5.2 2.0 7.2 33.2 14.5RAND -3.5 2.1 5.6 0.6 0.3

TABLE VEXPERIMENTS RESULTS OF EBD (MEASURED AS RMS DISTANCE) ANDEBD-2 (MEASURED AS NUMBER OF ITERATIONS WITH α = 0.0005) ON

CIFAR-10 USING WIDERESNET-28 AND ON SVHN USING VGG-16.

(a) CIFAR-10

MODEL VANILLA PAT NAT TRADESEBD 10.6 37.6 27.0 36.2

EBD-2 10.2 74.9 64.8 121.8

(b) SVHN

MODEL VANILLA PAT NAT TRADESEBD 26.7 32.7 26.7 32.1

EBD-2 29.4 89.0 39.0 109.0

selected from each class, we adversarially perturb these imagesusing FGSM and PGD, respectively. We then compute andreport the neuron-coverage related metrics (KMNCov, NBCov,SNACov, TKNCov) using these test sets. The results can befound in Table VII. Further, we show the results of ALDp,ASS, and PSD on these test sets in Table IX.

In summary, we can draw conclusions as follows: (1)adversarial examples generated by `∞-norm attacks show sig-

PGD- 1 PGD- 2 PGD-0

20

40

60

80

100

ACAC

99.9 100.0 100.099.3 99.2

90.4

98.2 99.996.8

63.7

70.8

55.8

VanillaPATSATTRADES

(a) ACAC on CIFAR-10

PGD- 1 PGD- 2 PGD-0

20

40

60

80

100

ACAC

94.5100.0 99.7

52.8

95.2

68.2

85.0

100.0 99.1

42.8

70.6

42.2

VanillaPATSATTRADES

(b) ACAC on SVHN

PGD- 1 PGD- 2 PGD-0

2

4

6

8

10

12

14

ACTC

0.0 0.0 0.00.1 0.1

5.3

0.1 0.0

2.2

4.0

1.3

13.2VanillaPATSATTRADES

(c) ACTC on CIFAR-10

PGD- 1 PGD- 2 PGD-0

20

40

60

80

100

ACTC

0.8 0.0 0.25.2

0.1

9.7

1.7 0.0 0.86.0

0.5

11.2

VanillaPATSATTRADES

(d) ACTC on SVHN

PGD- 1 PGD- 2 PGD-0

20

40

60

80

100NT

E

99.8 100.0 100.098.7 98.6

83.8

97.299.9

94.8

30.4

52.9

34.1

VanillaPATSATTRADES

(e) NTE on CIFAR-10

PGD- 1 PGD- 2 PGD-0

20

40

60

80

100

NTE

90.2

99.8 99.5

31.5

91.6

52.1

74.4

100.0 98.3

22.5

59.0

23.9

VanillaPATSATTRADES

(f) NTE on SVHN

=0.01 =0.03 =0.1 =0.2 =0.30

10

20

30

40

50

60

70

ACC

VanillaPATSATTRADES

(g) different ε on CIFAR-10

=0.01 =0.03 =0.1 =0.2 =0.30

10

20

30

40

50

60

70

80

ACC

VanillaPATSATTRADES

(h) different ε on SVHN

Fig. 2. Experimental results of ACAC, ACTC, NTE, and adversarial attackswith dfferent ε.

nificantly higher neuron coverage than other perturbation types(e.g., `1 and `2), which indicate that `∞-norm attacks covermore “paths” for a DNN when perform test or evaluation; (2)meanwhile, `∞-norm attacks are more imperceptible to thehuman vision according to Table IX (lower ALDp, PSD, andhigher ASS values compared to `1 and `2 attacks).

V. DISCUSSIONS AND SUGGESTIONS

Having demonstrated extensive experiments on thesedatasets using our comprehensive evaluation framework, wenow take a further step and provide additional suggestions tothe evaluation of model robustness as well as the design ofadversarial attacks/defenses in the future.


80.0 82.5 85.0 87.5 90.0 92.5Strategy Accuracy (%)

0

5

10

15

20

25

30

%

TRADES

PATSAT

Vanilla

mCERelative mCE

(a) CIFAR-10-C

80.0 82.5 85.0 87.5 90.0 92.5Strategy Accuracy (%)

0

1

2

3

4

5

6

7

8

9

mFR

(%)

TRADES

PAT

SATVanilla

(b) CIFAR-10-P

90 91 92 93 94 95Strategy Accuracy (%)

0

2

4

6

8

10

12

14

16

18

%

TRADES

PAT

Vanilla SAT

mCERelative mCE

(c) SVHN-C

90 91 92 93 94 95Strategy Accuracy (%)

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

mFR

(%)

TRADES

PAT

Vanilla

SAT

(d) SVHN-P

Fig. 3. Experimental results of mCE, RmCE, and mFR on CIFAR-10 andSVHN corruption dataset.

0 200 400 600 800 1000Rank of Direction

20

25

30

35

40

45

50

55

60

Dist

ance

VanillaPATSATTRADES

(a)

0 200 400 600 800 1000Rank of Image

0

20

40

60

80

100

120

Dist

ance

VanillaPATSATTRADES

(b)

Fig. 4. Specific distance values on CIFAR10 related to EBD: (a) the averagedistance moved in each orthogonal direction, and (b) the Empirical BoundaryDistance moved for 1000 different images.


55

60

65

70

75

80

Dist

ance

VanillaPATSATTRADES

(a)


0

10

20

30

40

50

60

70

Dist

ance

VanillaPATSATTRADES

(b)

Fig. 5. Specific distance values on SVHN related to EBD: (a) the averagedistance moved in each orthogonal direction, and (b) the Empirical BoundaryDistance moved for 1000 different images.

A. Evaluate Model Robustness using More Attacks

For most studies in the adversarial learning literature [21],[44], [56], [57], they evaluate model robustness primarily on`∞-norm bounded PGD attacks, which has been shown tobe the most effective and representative adversarial attack.

FGSM=8/255

FGSM=16/255

PGD- 1 PGD- 2 PGD-0

200

400

600

800

1000

1200

-ENI

VanillaPATSATTRADES

(a) adversarial attacks (CIFAR-10)

GaussianNoise

ShotNoise

Impulsenoise

SpeckleNoise

GaussianBlur

DefocusBlur

0

2

4

6

8

10

12

-ENI

VanillaPATSATTRADES

(b) corruption attacks (CIFAR-10)

FGSM=8/255

FGSM=16/255

PGD- 1 PGD- 2 PGD-0

200

400

600

800

-ENI

VanillaPATSATTRADES

(c) adversarial attacks (SVHN)

GaussianNoise

ShotNoise

Impulsenoise

SpeckleNoise

GaussianBlur

DefocusBlur

0.5

1.0

1.5

2.0

2.5

-ENI

VanillaPATSATTRADES

(d) corruption attacks (SVHN)

Fig. 6. Experimental results of ε-Empirical Noise Insensitivity on CIFAR10and SVHN.

Conv1 Block1 Block2 Block3 BN1 Relu LinearLayer

0.0

0.1

0.2

0.3

0.4

Neur

on S

ensit

ivity

VanillaPATSATTRADES

(a) Neuron Sensitivity

Conv1 Block1 Block2 Block3 BN1 Relu LinearLayer

0

20

40

60

80

100

Neur

on U

ncer

tain

ty

VanillaPATSATTRADES

(b) Neuron Uncertainty

Fig. 7. Experimental results of Neuron Sensitivity and Neuron Uncertaintyon CIFAR-10. We report the mean value of the metrics for each layer.

However, according to our experimental results, we suggestto provide more comprehensive evaluations on different typesof attacks:

(1) Evaluate model robustness on `p-norm bounded adver-sarial attacks. However, as shown in Table II and III, mostadversarial defenses are designed to counteract a single typeof perturbation (e.g., small `∞-noise) and offer no guaranteesfor other perturbations (e.g., `1, `2), sometimes even increasemodel vulnerability [58], [59]. Thus, to fully evaluate adver-sarial robustness, we suggest to use `1, `2, and `∞ attacks.

(2) Evaluate model robustness on adversarial attacks aswell as corruption attacks. In addition to adversarial examples,corruption such as snow and blur also frequently occur inthe real world, which also presents critical challenges forthe building of strong deep learning models. According toour studies, deep learning models behave distinctly subhumanto input images with different corruption. Meanwhile, adver-sarially robust models may also vulnerable to corruption asshown in Fig 3. Therefore, we suggest to take both adversarialrobustness and corruption robustness into consideration, whenmeasuring the model robustness against noises.


TABLE VIEXPERIMENTS RESULTS OF KMNCOV, NBCOV, SNACOV AND TKNCOV

ON CIFAR-10 WITH WIDERESNET-28.

(a) KMNCov

MODEL FGSM PGD-`1 PGD-`2 PGD-`∞ NATTACK SPSAVANILLA 93.9 50.9 71.8 95.3 96.2 96.3

PAT 97.4 62.9 86.8 97.6 97.1 97.9NAT 98.0 56.2 77.7 97.7 97.3 97.9

TRADES 96.4 49.7 71.0 96.6 96.5 96.8

(b) NBCov


PAT 43.6 19.8 37.6 43.7 31.0 37.7NAT 41.6 13.6 32.2 43.2 36.6 38.2

TRADES 29.6 3.5 9.3 30.9 29.6 32.3

(c) SNACov


PAT 34.2 11.2 27.5 35.5 23.9 27.3NAT 36.2 8.1 21.1 37.2 30.3 31.9

TRADES 21.1 1.5 5.7 22.4 21.5 23.8

(d) TKNCov


PAT 11.4 1.9 6.3 11.4 10.1 8.7NAT 9.4 1.5 3.7 11.5 10.3 7.8

TRADES 13.6 1.7 6.8 13.6 13.3 13.0

(3) Perform black-box or gradient-free adversarial attacks,such as NATTACK, SPSA, etc. Black-box attacks are effectiveto elaborating whether obfuscated gradients [25] have beenintroduced to a specific defense. Moreover, black-box attacksare also shown to cover more neurons when perform test asshown in Table VII.

B. Evaluate Model Robustness Considering Multiple Views

To mitigate the problem brought by incomplete evaluations,we suggest to evaluate model robustness using more rigorousmetrics, which consider multi-view robustness.

(1) Consider model behaviors with respect to more pro-found metrics, e.g., prediction confidence. For example, thoughshowing high adversarial accuracy, SAT and NAT are vulnera-ble by showing high confidence of adversarial classes and lowconfidence of true classes, which are similar to vanilla models.

(2) Evaluate model robustness in terms of model structures,e.g., boundary distance. For example, though ranking firstamong other baselines on adversarial accuracy, TRADESis not strong enough in terms of Neuron Sensitivity, EBDcompared to other baselines.

C. Design of Attacks and Defenses

Besides model robustness evaluation, the proposed metricsare also beneficial to the design of adversarial attacks anddefenses. Most of these metrics provide deep investigations of

TABLE VIIEXPERIMENTS RESULTS OF KMNCOV, NBCOV, SNACOV AND TKNCOV

ON SVHN WITH VGG-16.

(a) KMNCov


PAT 84.5 40.1 64.5 86.2 88.9 86.2NAT 83.5 41.7 62.5 85.3 88.5 84.4

TRADES 82.8 53.6 70.4 83.6 83.4 84.5

(b) NBCov


PAT 37.5 19.3 38.5 41.7 37.3 38.7NAT 34.9 15.6 36.4 47.5 41.3 40.6

TRADES 36.5 19.2 32.8 38.1 24.8 38.3

(c) SNACov


PAT 23.8 7.3 22.6 30.4 28.3 27.2NAT 25.8 10.0 27.6 42.0 32.1 31.0

TRADES 26.4 11.0 20.6 28.2 16.3 28.3

(d) TKNCov


PAT 5.4 1.5 3.2 5.9 7.7 4.9NAT 5.8 1.6 3.3 8.1 7.2 6.3

TRADES 7.3 1.6 0.3 7.5 7.1 6.8

TABLE VIIIEXPERIMENTS RESULTS OF ALDp , ASS AND PSD ON CIFAR-10 WITH

WIDERESNET-28.

(a) ALDp (p = 2)

MODEL FGSM PGD-`1 PGD-`2 PGD-`∞VANILLA 0.061 1.101 0.509 0.048

PAT 0.062 1.085 0.520 0.060NAT 0.063 1.085 0.515 0.051

TRADES 0.062 1.085 0.525 0.060

(b) ASS


PAT 92.0 0.6 27.0 92.6NAT 89.8 0.6 28.1 92.8

TRADES 92.4 0.5 27.1 92.8

(c) PSD


PAT 5.832 99.0 45.4 5.6NAT 5.812 99.0 45.1 4.7

TRADES 5.797 99.2 45.9 5.6

the model behaviors or structures towards noises, which canbe used for researchers to design adversarial attack or defensemethods. Regarding metrics in terms of model structures, we


TABLE IXEXPERIMENTS RESULTS OF ALDp , ASS AND PSD ON SVHN WITH

VGG-16.

(a) ALDp (p = 2)


PAT 0.082 1.326 0.624 0.074NAT 0.080 1.329 0.621 0.053

TRADES 0.080 1.312 0.631 0.074

(b) ASS


PAT 79.2 0.3 15.9 83.2NAT 75.6 0.3 15.8 87.8

TRADES 79.3 0.3 15.1 83.0

(c) PSD


PAT 2.7 54.4 25.5 2.4NAT 2.5 54.4 25.3 2.2

TRADES 2.8 54.7 25.4 2.6

can develop new attacks or defenses by either enhancing orimpairing them, since these metrics capture the structural pat-tern that manifest model robustness. For example, to improvemodel robustness, we can constrain the value of ENI, NeuronSensitivity, and Neuron Uncertainty.

VI. AN OPEN-SOURCED PLATFORM

To fully support our multi-view evaluation and facilitatefurther research, we provide an open-sourced platform referredas AISafety2 based on Pytorch. Our platform contains severalhighlights as follows:

(1) Multi-language environment. To facilitate the user flex-ibility, our platform supports the use of language-independentmodels (e.g., Java, C, Python, etc.). To achieve the goal,we establish the standardized input and output systems withuniform format. Specifically, we use the docker container toencapsulate the model input and output so that users couldload their models freely.

(2) High extendibility. Our platform also supports continu-ous integration of user-specific algorithms and models. In otherwords, users are able to introduce externally personal-designedattack, defense, evaluation methods, by simply inheriting thebase classes through several public interfaces.

(3) Multiple scenarios. Our platform integrates multiplereal-world application scenarios, e.g., auto-driving, automaticcheck-out, interactive robots.

Our platform consists of five main components: Attack mod-ule, Defense module, Evaluation module, Prediction module,and Database module. As shown in Figure 8, the Predictionmodule executes the model (might be trained using defensestrategies from the Defense module) on a specific dataset in

2https://git.openi.org.cn/OpenI/AISafety

Prediction Module

Original Data

Models

WEB API

Java

C++

Ruby

Attack Module

Corruption Attack19 Methods

Adversarial Attack16 Methods

Defense Module

Evaluation Module

Model-Oriented Evaluation Metrics

17 Evaluation Metrics

Data-Oriented Evaluation Metrics

6 Evaluation Metrics

Adversarial Defense

10 Defense Methods

Adversarial Examples

Corruption Examples

RobustModels

Fig. 8. The framework of our open source platform AISafety, which consistsof five main components: Attack module, Defense module, Evaluation module,Prediction module, and Database module.

TABLE XTHE COMPARISON OF AISAFETY AND OTHER OPEN-SOURCED

PLATFORMS.

ATTACKSDEFENSES

ROBUSTNESSEVALUATION

STATIC/DYNAMICANALYSIS

COMPETITIONMULTIPLE

SCENARIOSCLEVERHANS [60] X

FOOLBOX [61] XDEEPSEC [38] X X

DEEPXPLORE [62] XDEEPTEST [63] X XREALSAFE [27] X X

AISAFETY (OURS) X X X X X

the Database module using attacks from the Attack modulewith evaluation metrics selected from the Evaluation module.

(1) Attack module is used for generating adversarial ex-amples and corruption attacks, which contains 15 adversarialattacks and 19 corruption attacks.

(2) Defense module provides 10 adversarial defense strate-gies which could be used to improve model robustness.

(3) Evaluation module is used to evaluate the model robust-ness considering both the data and model aspect of the issue,which contains 23 different evaluation metrics.

(4) Prediction module executes the models and standardizethe model input and output using specific attacks and evalua-tion metrics.

(5) Database module collects several datasets and pre-trained models, which can be used for evaluation.

To make the platform flexible and user-friendly, we de-couple each part of the whole evaluation process. Users areable to customize their evaluation process by switching attackmethods, defense methods, evaluation methods, and modelsthrough simple parameter modification.

In contrast to other open-sourced platforms, as shown inTable X, our AISafety enjoys the advantages of static/dynamicanalysis, robustness evaluation, etc.

VII. CONCLUSION

Most of current defenses only conduct incomplete evalua-tions, which are far from providing comprehensive understand-ings for the limitations of these defenses. Thus, most proposeddefenses are quickly shown to be attacked successfully, whichresult in the “arm race” phenomenon between attack anddefense. To mitigate this problem, we establish a model

https://git.openi.org.cn/OpenI/AISafety


robustness evaluation framework containing a comprehensive,rigorous, and coherent set of evaluation metrics, which couldfully evaluate model robustness and provide deep insightsinto building robust models. Our framework primarily focuson the two key factors of adversarial learning (i.e., dataand model), and provide 23 evaluation metrics consideringmultiple aspects such as neuron coverage, data impercepti-bility, decision boundary distance, adversarial performance,etc. We conduct large scale experiments on multiple datasetsincluding CIFAR-10 and SVHN using different models anddefenses with our open-source platform AISafety, and provideadditional suggestions to model robustness evaluation as wellas attack/defense designing.

The objective of this work is to provide a comprehensiveevaluation framework which could conduct more rigorousevaluations of model robustness. We hope our paper canfacilitate fellow researchers for a better understanding of theadversarial examples as well as further improvement of modelrobustness.

REFERENCES

[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classificationwith deep convolutional neural networks,” in International Conferenceon Neural Information Processing Systems, 2012.

[2] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation byjointly learning to align and translate,” arXiv preprint arXiv:1409.0473,2014.

[3] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly,A. Senior, V. Vanhoucke, P. Nguyen, and T. N. Sainath, “Deep neuralnetworks for acoustic modeling in speech recognition: The shared viewsof four research groups,” IEEE Signal Processing Magazine, 2012.

[4] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow,and R. Fergus, “Intriguing properties of neural networks,” arXiv preprintarXiv:1312.6199, 2013.

[5] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial examples in thephysical world,” arXiv preprint arXiv:1607.02533, 2016.

[6] A. Liu, X. Liu, J. Fan, Y. Ma, A. Zhang, H. Xie, and D. Tao,“Perceptual-sensitive gan for generating adversarial patches,” in 33rdAAAI Conference on Artificial Intelligence, 2019.

[7] A. Liu, T. Huang, X. Liu, Y. Xu, Y. Ma, X. Chen, S. Maybank, andD. Tao, “Spatiotemporal attacks for embodied agents,” in EuropeanConference on Computer Vision, 2020.

[8] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessingadversarial examples,” arXiv preprint arXiv:1412.6572, 2014.

[9] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, andA. Swami, “Practical black-box attacks against deep learning systemsusing adversarial examples,” arXiv preprint arxiv:1602.02697, 2016.

[10] Y. Dong, F. Liao, T. Pang, and H. Su, “Boosting adversarial attackswith momentum,” in IEEE Conference on Computer Vision and PatternRecognition, 2018.

[11] A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok, “Synthesizing robustadversarial examples,” arXiv preprint arXiv:1707.07397, 2017.

[12] A. Liu, J. Wang, X. Liu, b. Cao, C. Zhang, and H. Yu, “Bias-baseduniversal adversarial patch attack for automatic check-out,” in EuropeanConference on Computer Vision, 2020.

[13] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towardsdeep learning models resistant to adversarial attacks,” in InternationalConference on Learning Representations, 2018.

[14] F. Liao, M. Liang, Y. Dong, T. Pang, X. Hu, and J. Zhu, “Defense againstadversarial attacks using high-level representation guided denoiser,” inIEEE Conference on Computer Vision and Pattern Recognition, 2018.

[15] J. Buckman, A. Roy, C. Raffel, and I. Goodfellow, “Thermometerencoding: One hot way to resist adversarial examples,” in InternationalConference on Learning Representations, 2018.

[16] Z. Yan, Y. Guo, and C. Zhang, “Deep defense: Training dnns withimproved adversarial robustness,” in Advances in Neural InformationProcessing Systems, 2018.

[17] M. Cisse, P. Bojanowski, E. Grave, Y. Dauphin, and N. Usunier,“Parseval networks: Improving robustness to adversarial examples,” inInternational Conference on Machine Learning, 2017.

[18] F. Tramer, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, andP. McDaniel, “Ensemble adversarial training: Attacks and defenses,”arXiv preprint arXiv:1705.07204, 2017.

[19] T. Li, A. Liu, X. Liu, Y. Xu, C. Zhang, and X. Xie, “Understandingadversarial robustness via critical attacking route,” Information Sciences,2021.

[20] N. Papernot, P. Mcdaniel, X. Wu, S. Jha, and A. Swami, “Distillationas a defense to adversarial perturbations against deep neural networks,”arXiv preprint arXiv:1511.04508, 2015.

[21] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille, “Mitigating adversarialeffects through randomization,” in International Conference on LearningRepresentations, 2018.

[22] C. Nicholas and W. David, “Defensive distillation is not robust toadversarial examples,” arXiv preprint arXiv:1607.04311, 2016.

[23] ——, “Towards evaluating the robustness of neural networks,” in IEEESymposium on Security and Privacy, 2017.

[24] ——, “Is ami attacks meet interpretability robust to adversarial exam-ples,” arXiv preprint arXiv:1902.02322, 2019.

[25] A. Athalye, N. Carlini, and D. Wagner, “Obfuscated gradients give afalse sense of security: Circumventing defenses to adversarial examples,”arXiv preprint arXiv:1802.00420, 2018.

[26] N. Carlini, A. Athalye, N. Papernot, W. Brendel, J. Rauber, D. Tsipras,I. Goodfellow, and A. Madry, “On evaluating adversarial robustness,”arXiv preprint arXiv:1902.06705, 2019.

[27] D. Yinpeng, F. Qi-An, Y. Xiao, P. Tianyu, S. Hang, X. Zihao, and Z. Jun,“Benchmarking adversarial robustness on image classification,” in IEEEConference on Computer Vision and Pattern Recognition, 2020.

[28] D. Yinpeng, S. Hang, W. Baoyuan, L. Zhifeng, L. Wei, Z. Tong, andZ. Jun, “Efficient decision-based black-box adversarial attacks on facerecognition,” in IEEE International Conference on Computer Vision,2019.

[29] M.-D. Seyed-Mohsen, F. Alhussein, and F. Pascal, “Deepfool: a simpleand accurate method to fool deep neural networks,” in IEEE Interna-tional Conference on Computer Vision, 2016.

[30] P.-Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C.-J. Hsieh, “Zoo: Zerothorder optimization based black-box attacks to deep neural networkswithout training substitute models,” in 10th ACM Workshop on ArtificialIntelligence and Security, 2017.

[31] Y. Li, L. Li, L. Wang, T. Zhang, and B. Gong, “Nattack: Learning thedistributions of adversarial examples for an improved black-box attackon deep neural networks,” in International Conference on MachineLearning, 2019.

[32] J. Uesato, B. O’Donoghue, A. van den Oord, and P. Kohli, “Adversarialrisk and the dangers of evaluating against weak attacks,” in InternationalConference on Machine Learning, 2018.

[33] W. Brendel, J. Rauber, and M. Bethge, “Decision-based adversarialattacks: Reliable attacks against black-box machine learning models,”in International Conference on Learning Representations, 2018.

[34] G. K. Dziugaite, G. Zoubin, and D. M. Roy, “A study of the effect of jpgcompression on adversarial images,” arXiv preprint arXiv:1608.00853,2016.

[35] F. Croce and M. Hein, “Provable robustness against all adversarial lp-perturbations for p ≥ 1,” in International Conference on LearningRepresentations, 2020.

[36] R. Aditi, S. Jacob, and L. Percy, “Semidefinite relaxations for certifyingrobustness to adversarial examples,” in Advances in Neural InformationProcessing Systems, 2018.

[37] ——, “Certified defenses against adversarial examples,” in InternationalConference on Learning Representations, 2018.

[38] L. Xiang, J. Shouling, Z. Jiaxu, W. Jiannan, W. Chunming, L. Bo, andW. Ting, “Deepsec: A uniform platform for security analysis of deeplearning model,” in 2019 IEEE Symposium on Security and Privacy (SP),year=2019.

[39] L. Ma, F. Juefei-Xu, F. Zhang, J. Sun, M. Xue, B. Li, C. Chen, T. Su,L. Li, Y. Liu, J. Zhao, and Y. Wang, “Deepgauge: Multi-granularity test-ing criteria for deep learning systems,” in 33rd ACM/IEEE InternationalConference on Automated Software Engineering, 2018.

[40] S. H. R. e. a. Wang Z, Bovik A C, “Image quality assessment: fromerror visibility to structural similarity,” IEEE transactions on imageprocessing, 2004.

[41] B. Luo, Y. Liu, L. Wei, and Q. Xu, “Towards imperceptible and robustadversarial example attacks against neural networks,” Proceedings ofthe AAAI Conference on Artificial Intelligence, vol. 32, no. 1, Apr.2018. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/11499

[42] F. A. C. M. Do T D, Hui S C, “Prediction confidence for associativeclassification,” 2005.

https://ojs.aaai.org/index.php/AAAI/article/view/11499

https://ojs.aaai.org/index.php/AAAI/article/view/11499


[43] D. Hendrycks and T. Dietterich, “Benchmarking neural network ro-bustness to common corruptions and perturbations,” in InternationalConference on Learning Representations, 2019.

[44] A. Liu, X. Liu, C. Zhang, H. Yu, Q. Liu, and J. He, “Training robustdeep neural networks via adversarial noise propagation,” arXiv preprintarXiv:1909.09034, 2019.

[45] C. Zhang, A. Liu, X. Liu, Y. Xu, H. Yu, Y. Ma, and T. Li, “Interpretingand improving adversarial robustness of deep neural networks withneuron sensitivity,” IEEE Transactions on Image Processing, vol. 30,pp. 1291–1304, 2021.

[46] B. T. T. T. M. . S. C. Myers, G. J., “The art of software testing,” inChichester, 2004.

[47] G. E. Legge and J. M. Foley, “Contrast masking in human vision,” Josa,vol. 70, no. 12, pp. 1458–1471, 1980.

[48] A. Liu, W. Lin, M. Paul, C. Deng, and F. Zhang, “Just noticeabledifference for images with decomposition model for separating edgeand textured regions,” IEEE Transactions on Circuits and Systems forVideo Technology, 2010.

[49] C. Cortes and V. Vapnik, “Support-vector networks,” Machine learning,1995.

[50] G. Elsayed, D. Krishnan, H. Mobahi, K. Regan, and S. Bengio, “Largemargin deep networks for classification,” in Advances in Neural Infor-mation Processing Systems, 2018.

[51] W. He, B. Li, and D. Song, “Decision boundary analysis of adversarialexamples,” in International Conference on Learning Representations,2018.

[52] H. Xu and S. Mannor, “Robustness and generalization,” Machine learn-ing, 2012.

[53] S. Zagoruyko and N. Komodakis, “Wide residual networks,” in TheBritish Machine Vision Conference, 2016.

[54] K. Simonyan and A. Zisserman, “Very deep convolutional networks forlarge-scale image recognition,” International Conference on LearningRepresentations, 2015.

[55] H. Zhang, Y. Yu, J. Jiao, E. P. Xing, L. E. Ghaoui, and M. I. Jordan,“Theoretically principled trade-off between robustness and accuracy,”2019.

[56] T. Zhang and Z. Zhu, “Interpreting adversarially trained convolutionalneural networks,” arXiv preprint arXiv:1905.09797, 2019.

[57] C. Xie and A. Yuille, “Intriguing properties of adversarial training atscale,” in International Conference on Learning Representations, 2020.

[58] F. Tramer and D. Boneh, “Adversarial training and robustness formultiple perturbations,” in Advances in Neural Information ProcessingSystems, 2019.

[59] M. A. Pratyush, W. Eric, and K. J. Zico, “adversarial robustness againstthe union of multiple perturbation model,” in International Conferenceon Machine Learning, 2020.

[60] N. Papernot, I. Goodfellow, R. Sheatsley, R. Feinman, and P. McDaniel,“cleverhans v2. 0.0: an adversarial machine learning library,” arXivpreprint arXiv:1610.00768, vol. 10, 2016.

[61] J. Rauber, W. Brendel, and M. Bethge, “Foolbox: A python toolbox tobenchmark the robustness of machine learning models,” 2017.

[62] K. Pei, Y. Cao, J. Yang, and S. Jana, “Deepxplore: Automated whiteboxtesting of deep learning systems,” in proceedings of the 26th Symposiumon Operating Systems Principles, 2017.

[63] Y. Tian, K. Pei, S. Jana, and B. Ray, “Deeptest: Automated testingof deep-neural-network-driven autonomous cars,” in Proceedings of the40th international conference on software engineering, 2018.

Documents

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. , NO