12

Click here to load reader

MalGene: Automatic Extraction of Malware Analysis Evasion ... · up emulated and virtualized sandboxes resistant to evolv-ing evasion techniques is a current industry challenge. To

Embed Size (px)

Citation preview

Page 1: MalGene: Automatic Extraction of Malware Analysis Evasion ... · up emulated and virtualized sandboxes resistant to evolv-ing evasion techniques is a current industry challenge. To

MalGene: Automatic Extraction of Malware AnalysisEvasion Signature

Dhilung KiratUniversity of California, Santa Barbara

[email protected]

Giovanni VignaUniversity of California, Santa Barbara

[email protected]

ABSTRACT

Automated dynamic malware analysis is a common approachfor detecting malicious software. However, many malwaresamples identify the presence of the analysis environmentand evade detection by not performing any malicious ac-tivity. Recently, an approach to the automated detectionof such evasive malware was proposed. In this approach,a malware sample is analyzed in multiple analysis environ-ments, including a bare-metal environment, and its variousbehaviors are compared. Malware whose behavior deviatessubstantially is identified as evasive malware. However, amalware analyst still needs to re-analyze the identified eva-sive sample to understand the technique used for evasion.Different tools are available to help malware analysts in thisprocess. However, these tools in practice require consider-able manual input along with auxiliary information. Thismanual process is resource-intensive and not scalable.

In this paper, we present MalGene, an automated tech-nique for extracting analysis evasion signatures. MalGene

leverages algorithms borrowed from bioinformatics to auto-matically locate evasive behavior in system call sequences.Data flow analysis and data mining techniques are used toidentify call events and data comparison events used to per-form the evasion. These events are used to construct a suc-cinct evasion signature, which can be used by an analyst toquickly understand evasions. Finally, evasive malware sam-ples are clustered based on their underlying evasive tech-niques. We evaluated our techniques on 2810 evasive sam-ples. We were able to automatically extract their analysisevasion signatures and group them into 78 similar evasiontechniques.

Categories and Subject Descriptors

C.2.0 [Computer-Communication Networks]: General—Security and protection; D.4.6 [Software Engineering]:Security and Protection—Invasive software (malware); J.3[Computer Applications]: Life and Medical Sciences—Biology and genetics

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for profit or commercial advantage and that copies bear this notice and the full citation

on the first page. Copyrights for components of this work owned by others than the

author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or

republish, to post on servers or to redistribute to lists, requires prior specific permission

and/or a fee. Request permissions from [email protected].

CCS’15, October 12–16, 2015, Denver, Colorado, USA.

Copyright is held by the owner/author(s). Publication rights licensed to ACM.

ACM 978-1-4503-3832-5/15/10 ...$15.00.

DOI: http://dx.doi.org/10.1145/2810103.2813642 .

Keywords

computer security; malware analysis; evasive malware; se-quence alignment; bioinformatics

1. INTRODUCTIONAutomated dynamic malware analysis is a common ap-

proach for analyzing and detecting a wide variety of mali-cious software. Dynamic analysis systems have become morepopular because signature-based and static-analysis-baseddetection approaches are easily evaded using widely availabletechniques such as obfuscation, polymorphism, and encryp-tion. However, many malware samples identify the presenceof the analysis environment and evade detection by avoid-ing the execution of suspicious operations. Malware authorshave developed several ways to detect the presence of mal-ware analysis systems [13, 25, 26, 28, 29]. The most commonapproach is based on the inspection of some specific arti-facts related to the analysis systems. This includes checkingfor the presence of registry keys or I/O ports, backgroundprocesses, function hooks, or IP addresses that are specificto some known malware analysis service. For example, amalware running inside a Virtualbox guest operating sys-tem can simply inspect Virtualbox-specific service names,or the hardware IDs of the available virtual devices, andcheck for the substring VBOX. Another approach to evasionis to fingerprint the underlying CPU that is executing themalware. For example, fingerprinting can be achieved bydetecting the differences in the timing property of the ex-ecution of certain instructions, or a small variation in theCPU execution semantics [25,29].

Recently, an approach to the automated detection of eva-sive malware has been proposed [17]. In this approach, mal-ware is executed in a bare-metal execution environment aswell as environments that leverage virtualization and em-ulation. Malware behaviors are extracted from these exe-cutions and compared to detect deviations in the behaviorin the assumption that bare-metal execution represents the“real” behavior of the malware. Malware whose behaviordeviates substantially among the execution environmentsis labeled as evasive malware. This way, evasive malwareis identified without knowing the underlying evasion tech-nique. This approach requires each malware to be run ona bare-metal environment. However, compared to a bare-metal environment, emulated and virtualized environmentsare easier to scale and they provide far better control andvisibility over malware execution. For these practical rea-sons, emulated or virtualized sandboxes are widely used forlarge-scale automated malware analysis. However, keeping

Page 2: MalGene: Automatic Extraction of Malware Analysis Evasion ... · up emulated and virtualized sandboxes resistant to evolv-ing evasion techniques is a current industry challenge. To

up emulated and virtualized sandboxes resistant to evolv-ing evasion techniques is a current industry challenge. Tocombat sandbox evasion attacks, a complete understandingof evasion techniques is the first fundamental step, as thisknowledge can help “fix” sandboxes and make them robustagainst evasion attacks. Currently, understanding evasiontechniques is largely a manual process.

Several analysis tools are available to help analyze mal-ware behavior differences [7,15]. These tools are effective inperforming manual, fine-grained analysis of evasive malware.However, they require additional auxiliary information, suchas a set of system calls corresponding to malicious behavioror the selection of control-flow differences. Finding this aux-iliary information is a manual process. This manual processis resource-intensive and not scalable. However, performingsuch analysis on a large scale is necessary to combat rapidlyevolving evasion attacks.

In general, the manual process required to understand anevasion instance starts from two sequences of system calltraces of the same malware sample when executed in two dif-ferent execution environments. The malware sample evadesone of the environments, creating a difference between thesystem call sequences. The first step of the evasion analysisinvolves finding the location in the system call traces wherethe execution deviates due to evasion. After accuratelylocating the deviation, understanding the evasion requiresidentifying environment-specific artifacts that are used forfingerprinting the analysis environment. In the first step,manually finding the location of the deviation in the systemcall sequence can be difficult. The naıve approach of lookingfor the first call that is different in both sequences does notwork. System call traces are usually noisy, and there can bethousands of events in the sequence. Even when running thesame program in exactly the same environment twice, thesystem call traces can be quite different. Thread schedul-ing is one of the main reasons for these differences, however,other factors, such as operating-system and library-specificaberrations, initialization characteristics, and timing, canplay a substantial role. Another approach would be to takea diff of the sequences, in the assumption that there willbe a large gap in the alignment corresponding to the eva-sion in one of the environments. However, this approachmay not accurately align the sequences. A generic diff algo-rithm finds the longest common subsequence (LCS) of thesequences. This approach is effective when large portions ofthe sequences have unique alphabets, such as the lines of asource code. However, a system call sequence has a limitedalphabet, while the sequence itself is usually long. Becauseof this, instead of forming a gap, some subsequence of sys-tem calls corresponding to malicious behaviors are likely toalign with another sequence where the malicious behavior isabsent.

In this paper, we present MalGene, an automatic tech-nique for extracting human-readable evasion signatures fromevasive malware. MalGene leverages local sequence align-ment techniques borrowed from bioinformatics to automat-ically locate evasions in a system call sequence. Such se-quence alignment techniques are widely used for aligninglong sequences of DNA or proteins [11, 14, 27]. These algo-rithms are known to be effective even if there are large gapsand the size of the alphabet is limited, such as the alphabetof four bases: Thymine (T), Adenine (A), Cytosine (C), andGuanine (G) in case of DNA sequence. We use data flow

analysis and inverse document frequency-based techniquesto automatically identify call events and data comparisonsused by the evasion techniques. We build evasion signaturesfrom these identified events. Finally, malware samples areclustered based on their underlying evasive techniques.

Our work makes the following contributions:

• We present MalGene, a system for automatically ex-tracting evasion signatures from evasive malware. Oursystem leverages a combination of data mining anddata flow analysis techniques to automate the signa-ture extraction process, which can be applied to alarge-scale sample set.

• We propose a novel bioinformatics-inspired approachto system call sequence alignment for locating evasions.The proposed algorithm performs deduplication, dif-ference pruning, and can handle branched sequences.

• We evaluated our techniques on 2810 evasive samples.We were able to automatically extract their analysisevasion signatures and group them into 78 similar eva-sion techniques.

2. EVASION SIGNATURE MODELIn general, malware evades analysis in two steps. First, it

extracts information about the execution environment. Sec-ond, it performs some comparison on the extracted informa-tion to make the decision whether to evade or not. Usually,malware uses system calls and user-mode API calls in thefirst step to probe the execution environment. In the secondstep, it uses some predefined constant values or informationextracted from previous system or user API calls. With thisgeneralization, we define an evasion signature as a set of sys-tem call events, user API call events, and comparison eventsthat are used as the basis for evading the analysis system.A comparison event is an execution of a comparison instruc-tion, such as a CMP instruction in the x86 instruction set.Usually, a call to one of such instructions is necessary tomake the control flow decision during evasion, which is thesecond step of the evasion process as mentioned earlier.

Formally, let P be the set of all call events (both systemcalls and API calls) andQ be the set of all comparison eventsthat are used by an evasion technique; we define the evasionsignature ∆ of this technique as:

∆ = P ∪Q

We represent a call event p : p ∈ P as a pair (name(p),attrib(p)), where, name(p) represents the name of the call,e.g., NtCreateFile, and attrib(p) represents the name ofthe operating system object associated with the call, e.g.,C:/boot.ini. We represent a comparison event q : q ∈ Qas a pair (p, v), where p is a call event that produced theinformation in the first operand compared by event q. vrepresents either some constant value used in the secondoperand, or another call event that produced the informationfor the second operand.

We extract the evasion signature ∆ of an evasive malwaresample in two steps. In the first step, we locate the evasionin the call sequences resulting from the execution of the mal-ware in different environments, as described in Section 3.2.3.In the second step, we identify the elements of ∆ used forthe evasion, as described in Section 4.

Page 3: MalGene: Automatic Extraction of Malware Analysis Evasion ... · up emulated and virtualized sandboxes resistant to evolv-ing evasion techniques is a current industry challenge. To

The model defined by ∆ only captures those evasion tech-niques that must trigger some system or user API calls. Themajority of known evasion techniques falls in this category.Some techniques may not directly make a system or API call,such as a forced exception-based CPU-fingerprinting [25].However, such techniques indirectly trigger calls to excep-tion handlers, which are captured by P . But again, in case ofan emulated CPU, there are known evasion techniques thatare entirely based on the inspection of the FPU, memory,or register state after the execution of certain instructions.Some evasion techniques are based on stalling code. Ourcurrent model does not capture such evasion techniques.

3. SEQUENCE ALIGNMENTThe input to our system is a set of evasive malware sam-

ples detected by an automatic evasion detection systems,called BareCloud [17]. BareCloud provides information aboutwhich of the analysis environments a malware sample evades.To extract the evasion signature of an evasive malware sam-ple, we analyze the sample in two analysis environmentswhere it evades one of the environments while showing mali-cious activity in the other. In the first step, we start from thetwo sequences of system call events from these two analysisenvironments. Because the system calls related to the mali-cious activities are entirely missing in one of the sequences,there must be an observable deviation between the two se-quences. The goal here is to efficiently and accurately findthe location of the deviation in the sequence correspondingto the evasion. To do this, we first align two sequences start-ing from the beginning, introducing gaps as required for anoptimal alignment. We locate the deviation by finding thelargest gap in the aligned sequence. We consider this loca-tion as the evasion point. The malware activity significantlydiffers after this point, implying evasion.

The intuition here is that an evasive malware sample mustperform its evasion “check” in both environments before theevasion point. Once we locate the evasion point, we extractthe evasion signature from the detailed analysis log, whichcontains user API calls and comparison events, as describedin Section 4. Note that only the system-call level moni-toring is required for locating the evasion point. This isadvantageous because the monitoring of user API calls andcomparison events may not be available in both analysis en-vironments. However, most of the existing malware analysissystems are capable of producing system-call level executionprofiles.

Apart from the malware evasion, there can be other fac-tors that can cause deviation in the malware execution. Wefollowed all strategies proposed in BareCloud [17] to limitdeviations due to external factors. That is, we used identicallocal network and identical internal software configurationsfor all execution environments. We executed each malwaresample in both environments at the same time to mitigatedate time-related deviations. We used network service filtersto provide consistent responses to DNS and SMTP commu-nications for all environments.

One simple approach to finding the largest gap in thealignment of system call sequences would be to take a diff ofthe sequences. However, the generic diff algorithm finds thelongest common subsequence (LCS) of the sequences, whichmay not accurately align the sequences in our context. Thisis because a) a system call sequence is usually a long seriesof events drawn from a limited alphabet, e.g., around 300

system calls in the Windows platform, and b) the differ-ence between the sequences tends to be large. System callnames when combined with their arguments can increasethe size of the alphabet. However, there are frequent sys-tem calls, such as NtAllocateVirtualMemory, that act onunnamed OS objects or nondeterministic argument values.Using nondeterministic argument values, such as memoryaddresses, creates too many undesirable mismatches result-ing a poor alignment. In such cases, we discard the attrib()values of the system call events to get a more stable align-ments. To illustrate this, let us take example sequences Aand B as shown in Figure 1(a). Here, sequences A and Bare system call sequences of the same malware sample whenexecuted in two different execution environments. SequenceA corresponds to the execution environment where the mal-ware evades analysis, while sequence B corresponds to theexecution environment where the malware shows its mali-cious activity. The “malicious section” of the sequence Bcorresponding to the malicious activity of the malware sam-ple is illustrated with a darker background. This malicioussection is missing in the sequence A because the malwaresample evades analysis. In this example, the LCS-basedalignment matches the first three calls from A1 with B1, asexpected. However, the rest of the sequence of A is matchedwith common subsequences from the malicious section of Bto maximize the length of the common subsequence. In thiscase, it is an algorithmically optimal but semantically incor-rect alignment. However, this is likely to happen becausethe malicious sections are usually long and the alphabet islimited in size. Note that the system call NtTerminatePro-cess does not align because such alignment will result ina shorter common subsequence. However, the alignment ofimportant call events is critical for accurately locating theevasion point. This LCS-based alignment example showsthat the longest common subsequence may not always pro-duce the most meaningful alignment of the system call se-quences.

To address this problem, we propose to apply sequencealignment algorithms borrowed from bioinformatics. Suchalgorithms are used to identify regions of similarity in se-quences of DNA, RNA, or proteins [11, 14, 27]. These re-gions of similarity usually correspond to evolutionary rela-tionships between the sequences [22]. In the case of systemcall sequences, such similarity regions correspond to the ex-ecution of similar code or the same high-level library func-tions. While aligning system call sequences, the alignmentsof some system calls are more critical than others, such asthe alignment of NtTerminateProcess in Figure 1(a), be-cause they represent important events in the program execu-tion. Sequence alignment algorithms from bioninformaticscan prioritize such critical alignments. Furthermore, thesealgorithms support more versatile similarity scores amongsystem calls, which can produce better approximation of thealignments in the presence of noise in the sequences.

There are two approaches to sequence alignment: GlobalAlignment and Local Alignment. In the next section, webriefly describe these approaches.

3.1 Global and Local AlignmentsWhen finding alignments, global alignment algorithms,

such as Needleman-Wunsh [24], take the entirety of bothsequences into consideration. It is a form of global optimiza-tion that forces the alignment to span the entire length [27].

Page 4: MalGene: Automatic Extraction of Malware Analysis Evasion ... · up emulated and virtualized sandboxes resistant to evolv-ing evasion techniques is a current industry challenge. To

a) Diff (LCS) b) Local Alignment

NtOpenKeyedEvent(MEMORYEVENT)

NtQuerySystemInformation(SysInfo)

NtQueryValueKey(.../SystemBiosVersion)

NtAllocateVirtualMemory()

NtReadVirtualMemory()

NtAllocateVirtualMemory()

NtReadVirtualMemory()

NtMapViewOfSection()

A

NtMapViewOfSection()

NtTerminateProcess()

NtSetInformationThread()

NtUnmapViewOfSection()

NtClose()

NtOpenKeyedEvent(MEMORYEVENT)

NtQuerySystemInformation(SysInfo)

NtQueryValueKey(.../SystemBiosVersion)

NtMapViewOfSection()

NtTerminateProcess()

NtClose()

NtOpenKeyedEvent(MEMORYEVENT)

NtQuerySystemInformation(SysInfo)

NtQueryValueKey(.../SystemBiosVersion)

NtAllocateVirtualMemory()

NtReadVirtualMemory()

NtOpenProcess(CSRSS.EXE)

NtMapViewOfSection()

NtAllocateVirtualMemory()

NtReadVirtualMemory()

NtMapViewOfSection()

NtSetInformationThread()

NtUnmapViewOfSection()

NtClose()

NtQueryInformationProcess(ProcInfo)

NtProtectVirtualMemory()

NtOpenProcessTokenEx()

NtQueryInformationToken(TokenUser)

NtReadVirtualMemory()

NtClose()

A

NtMapViewOfSection()

NtTerminateProcess()

NtSetInformationThread()

NtUnmapViewOfSection()

NtClose()

NtOpenKeyedEvent(MEMORYEVENT)

NtQuerySystemInformation(SysInfo)

NtQueryValueKey(.../SystemBiosVersion)

NtAllocateVirtualMemory()

NtReadVirtualMemory()

NtMapViewOfSection()

NtTerminateProcess()

NtClose()

NtAllocateVirtualMemory()

NtReadVirtualMemory()

NtOpenProcess(CSRSS.EXE)

NtMapViewOfSection()

NtSetInformationThread()

NtUnmapViewOfSection()

NtClose()

NtQueryInformationProcess(ProcInfo)

NtProtectVirtualMemory()

NtOpenProcessTokenEx()

NtQueryInformationToken(TokenUser)

NtReadVirtualMemory()

NtClose()

B B

A2

A1

A2

A1B1

B2

B3

B4

B5

B6

Start

Evasion

End

Start

Evasion

End

B1

B2

B3

B4

Figure 1: Sequence Alignments

This approach is useful when there is no deviation in themalware behavior, or the deviation is minimal.

Local alignment algorithms, such as Smith-Waterman [30],tend to find good matches of local subsequences betweentwo sequences. Hence, these algorithms identify regions ofsimilarity within long sequences that are often widely di-vergent overall. This approach is better if there are largemissing parts in the sequence. This is true for a system callsequence corresponding to evasion, such as sequence A inFigure 1(b), which is missing system calls corresponding toB2, the malicious section of B. For this reason, we use a Lo-cal Alignment algorithm for aligning system call sequences.Figure 1(b) represents the alignment using a local alignmentalgorithm. Notice that there is no undesirable alignmentwith the malicious section of the sequence B. The NtTer-

minateProcess system call is aligned even though the totalnumber of matches is smaller compared to the LCS-basedalignment (8 vs. 9 matches). The alignment in Figure 1(b)is clearly the better alignment for locating the evasion pointcompared to the LCS-based alignment in Figure 1(a).

3.1.1 Local Alignment

In this section, we briefly describe the Smith-Waterman [30]local alignment algorithm.

Given two sequencesA = a1, a2, ..., an andB = b1, b2, ..., bmof length n and m respectively, a maximum similarity ma-trix H is computed using the following induction:

H(i, 0) = 0, 0 ≤ i ≤ m,H(0, j) = 0, 0 ≤ j ≤ n,and

H(i, j) = max

0H(i− 1, j − 1) + Sim(ai, bj)maxk≥1{H(i− k, j) + Wk}maxl≥1{H(i, j − l) + Wl}

,

1 ≤ i ≤ m, 1 ≤ j ≤ nwhere a and b are strings over the alphabet Σ, Sim(a, b) is

a similarity score function on the alphabet, andWi is the gappenalty schema. Here, H(i, j) represents the maximum sim-ilarity score between suffixes of [a1, a2...ai] and [b1, b2...bi].

To obtain the optimal local alignment, backtracking is per-formed starting from the highest value in the matrix H(i, j).We used a scalable implementation of the local alignment

algorithm [14]. We provide more information about the sim-ilarity score function and gap penalty schema in the nextsections.

3.2 System Call AlignmentA system call sequence consists of a sequence of system

call events. While the order of biological sequences repre-sents a structural property, the order of system call sequencerepresents the temporal execution order. The order of sys-tem call events has stronger significance when events areinterdependent. For example, in order to create a thread ina foreign process to run arbitrary code, one must follow acertain order of system calls. Even with insertion of gaps,sequence alignment preserves this order while aligning se-quences.

3.2.1 Similarity Score

One of the most important parts of the sequence align-ment algorithm is the similarity-scoring schema. Based onthe domain knowledge, the scoring schema computes a simi-larity score between two elements in the sequence. A straight-forward approach would be to simply assign a value µ > 0for a match and σ < 0 for a mismatch. Values of µ and σcan be constant values or they may depend on the pair ofsequence elements being compared.

There are many studies on modeling similarity schemafor biological sequence alignment [3,9]. These schemata arebased on biological evidence, where a mismatch is treatedas mutation. In general, the match score µ is based onthe functional significance of the match, and the mismatchscore σ is statistically computed from the observed muta-tions seen in nature. Point Accepted Mutation (PAM) [9]and Blocks Substitution Matrix (BLOSUM) [3] are the twomost widely-used similarity schemata. The main focus ofthese schemata is to model mismatch scores based on theobserved probability of the mutation under comparison. Asimilar approach may be useful while comparing system call

Page 5: MalGene: Automatic Extraction of Malware Analysis Evasion ... · up emulated and virtualized sandboxes resistant to evolv-ing evasion techniques is a current industry challenge. To

sequences of polymorphic variants of malware. However,we are comparing system call sequences of the same code.We observed that malware polymorphism happens mostlyduring the propagation step, i.e., while the malware samplecreates a copy of itself, while runtime polymorphism is lesscommon. Moreover, achieving the same functionality by re-placing the system call is difficult. That is, the probabilityof mutation in the system call sequences extracted from twoexecutions of the same malware sample is very small. Thismeans that the mismatches of system calls are less common.

In our case, the challenge is to meaningfully quantifymatchand mismatch in case of system calls. There may be a vary-ing number of arguments associated with each system callevent. Not all arguments are equally important for similar-ity computation. As discussed earlier, alignments of somesystem calls are more important than others. For example,we want to prioritize the alignment of NtCreateProcess overNtQueryValueKey because creating a process is a more crit-ical event compared to reading a registry value. We canassign a high similarity value for a match of a critical sys-tem call, which helps build an “anchor point” during thealignment process. In our current model, the list of suchcritical system calls includes system calls that create andterminate processes and threads. We propose the followingsimilarity-scoring schema for computing similarity betweentwo system calls.

Sim(a, b) = Bias(a, b)∗ (NameSim(a, b)+AttribSim(a, b))

where,

NameSim(a, b) =

{

wt if name(a) = name(b),nwt if name(a) 6= name(b)

}

,

AttribSim(a, b) =

wa if name(a) = name(b)and attrib(a) = attrib(b),

nwa if name(a) = name(b)and attrib(a) 6= attrib(b),

0 If name(a) 6= name(b)

, and

Bias(a, b) =

wb if name(a) or name(b)is an important system call,

1 else.

Here, a and b are system call events, and name() andattrib() have the meaning described in Section 2. In prac-tice, similar system calls are those calls that perform similaractions on similar operating system objects.

3.2.2 Gap Penalty

Another important component of the sequence alignmentalgorithm is the gap penalty schema. In general, a gappenalty is a negative score added to the similarity score todiscourage indels (insertion or deletion). Large gap penaltyis effective in aligning sequences properly if the majority ofthe sequences are identical. However, in our case, we expectto have gaps in the sequence because of the noise and theevasion. Since our goal is to properly identify the gap intro-duced by evasion, in some way we want to encourage longgaps in the alignment.

There are three main types of gap penalties used in thecontext of biological sequences: constant, linear, and affinegap penalty. The constant gap penalty simply gives a fixednegative score for each gap opening. This value does notdepend on the length of the gap. This is a simple and fastschema. However, this schema gives too much freedom forsequence alignment, resulting in unnecessary long gaps. Thelinear gap penalty, as the name implies, linearly increasesthe penalty score in proportion to the length of the indel.This method favors shorter gaps by severely penalizing longindels, which is not suitable in our context. The affine gappenalty combines both constant and linear gap penalties,taking the form ga + (gb ∗L). That is, it assigns an openinggap penalty ga, which increases with the rate of gb. We canuse a smaller value for gb to favor longer gaps. By choosing|ga| > |gb|, we can model a gap penalty such that it is easierto extend a gap than to open it. We use this model of gappenalty when aligning system call sequences.

3.2.3 Parameter Selection

In our approach to system call alignment, like any otheralignment problem, there are certain constraints we needto follow while designing similarity score and gap penaltyparameters. More precisely, we want to have the follow-ing inequality relation as a guideline for choosing parametervalues:

nwt ≤ ga < gb < 0 < wt + nwa < wt < wt + wa

Here, we want all mismatches and indels to have negativevalues and all matches, including partial matches, to havepositive values. Intuitively, a match where both name()and attrib() match gets the highest score (wt + wa). Sim-ilarly, a name() match and attrib() mismatch gets a lowerscore (wt + nwa) than when name() matches but there isno attrib() associated with the events to be compared with,such as for the NtYieldExecution system call event. Bychoosing nwt ≤ ga, we favor gaps over mismatched align-ment. The inequality relation among parameters and theirrelative values are more important than the actual valuesof the parameters. If all parameters are scaled by the samefactor, the final alignment output of the algorithm remainsthe same.

Furthermore, the bias multiplier wb used to computeBias(a, b) needs to be large enough to overcome possiblepenalty introduced by expected long gaps in case of evasivesamples. For example, we want to prioritize the alignment ofthe NtTerminateThread system call, which is usually locatedtowards the end of the sequences, which requires a long gap.

3.2.4 Deduplication

Sometimes a tight loop in the execution may produce along sequence of repeated short subsequences. Such rep-etition may contain thousands of system calls, excessivelyincreasing the space and time complexity requirement forsequence alignment. To this end, we identify contiguouslyrepeating subsequence of system calls of length one, two,and three. If such subsequence repeats more than five timescontiguously, we discard all remaining subsequences duringsequence alignment. There are two advantages in doing this.First, it greatly reduces the space and time requirement forsequence alignment. Second, it prevents possible inaccurate

Page 6: MalGene: Automatic Extraction of Malware Analysis Evasion ... · up emulated and virtualized sandboxes resistant to evolv-ing evasion techniques is a current industry challenge. To

detection of the longest gap due to the difference in the rep-etition count between two call sequences.

3.2.5 Difference Pruning

Accurately identifying the largest gap corresponding tothe evasion is critical in finding the evasion point. However,there is a possibility of short subsequence alignments break-ing the large gap associated with the evasion. This maycause the algorithm to incorrectly pick the largest gap and,in turn, return the incorrect evasion point. To mitigate thisproblem, we apply a difference pruning process to prunepossibly-incorrect small alignments in-between large gaps.Let S be the sequence alignment output of two sequencesA and B. Let Sa, Sb, and Sc be three consecutive regionsof S where, Sb is a match alignment and Sa and Sc are gapalignments such that they are both insertion or both dele-tion alignments. We discard the Sb alignment region to com-bine two gaps corresponding to Sa and Sc if Sb is relativelyvery small compared to the length of Sa and Sc combined.More precisely, if length(Sb)/(length(Sa)+length(Sc)) < tdwe discard Sb and join Sa and Sc to find the largest gap.Through a series of experiments described in Section 5, weobtained the optimal value of td = 0.02. That is, if thelength of a match region between two gaps is less than 2%of the sum of the lengths of the gaps, we prune the matchregion and connect the gaps. This process only affects thecalculation of the largest gap without affecting the actualsequence alignment output. The pruning is performed ina single pass without updating the underlying sequence toavoid a newly-formed longer gap destabilizing the pruningprocess.

3.3 Handling Sequence BranchingAll sequence alignment algorithms from bioinformatics can

only handle monolithic single sequences. However, a se-quence of system calls may include calls from multiple threads.The main process thread can create multiple threads, whichin turn can create more threads. Hence, a system call se-quence has an inherent tree structure. A naıve way of com-bining system calls from multiple threads into a single callsequence can produce anomalous sequence alignment.

We propose a recursive algorithm to handle branched se-quences. The input of this algorithm is a single system callsequence of a process where system calls from all threads arechronologically merged. Each event in the sequence is taggedwith its corresponding thread ID. First, we preprocess thissystem call sequence to generate a branching sequence struc-ture by sequentially inspecting events from the start of thesequence. Whenever a new thread is encountered, we in-sert a new meta-node at the location where the thread wascreated. This is the location of the NtCreateThread systemcall corresponding to the thread. We create a new blanksequence and associate it with the new meta-node. A meta-node represents a branching point in the main sequence.We remove all occurrences of system calls associated withthe new thread from the main sequence and append it tothe newly created sequence associated with the meta-node.The one-to-one mapping of a new thread event and its cor-responding NtCreateThread may not always be available inthe execution profile. To this end, we assign a new threadevent with the last unassigned call of NtCreateThread. Dur-ing the alignment process, two meta-nodes are recursivelyprocessed to compute the similarity score. That is, to com-

pute the similarity score between two meta-nodes, we firstperform sequence alignment of the sequences correspondingto the meta-nodes. Similarity is then computed as the differ-ence between the total length of the matching sections andthe total length of the mismatch sections of the alignmentoutput. If at least one of the two arguments to Sim(a, b) is ameta-node, the following similarity-scoring schema is used.

Sim(a, b) = MSim(a, b) (1)where,

MSim(a, b) =

sm − sg if a and b are meta-nodes,−na if only a is a meta-node,−nb if only b is a meta-node.

.Here, sm is the total length of all matching sections of the

alignment output corresponding to meta-nodes a and b, sg isthe total length of all gap sections of the alignment output,na is the length of the sequence corresponding to meta-nodea, and nb is the length of the sequence corresponding tometa-node b. Note that if meta-nodes a and b correspondto two completely different threads, sg will be greater thansm in resulting a negative similarity score.

4. EVASION SIGNATURE EXTRACTIONIn the previous section, we described an evasion point as

the location of deviation in the call event sequence corre-sponding to the evasion. In this section, we describe how weuse this information to extract the evasion signature.

4.1 Evasion SectionIntuitively, all system calls, API calls, and comparison

events used to make an evasion decision must happen beforethe evasion point. We observed that such events are usuallylocated close to the evasion point. To capture the localityof such events in the sequence, we define an evasion section,which consists of the event sequence prior to but close tothe evasion point. More precisely, let E be the sequenceof malware execution events that consists of all system callevents, user API call events, and comparison events, the

evasion section E′

of the event sequence E is defined as:

E′

= {e ∈ E(i) : k − ω ≤ i < k},

where, k is the index to the evasion point, and ω is thesize of the evasion section.

If ω is large enough, E′

can extend all the way to thebeginning of the event sequence E. This case guarantees

that P ⊂ E′

and Q ⊂ E′

where, P and Q are call eventsand comparison events related to evasion, as introduced in

Section 2. That is, evasion signature ∆ ⊂ E′

, since ∆ = P ∪

Q. However, with large values of ω, evasion section E′

alsoincludes many other events that are not related to evasion.By reducing the value of ω we can reduce the number of such

unrelated events and improve the relation ∆ ≈ E′

. We alsoobserved that the comparison events in Q that are used forevasion are likely to be performed very close to the evasionpoint k. This allows us to reduce ω to smaller values and

still have Q ⊂ E′

. This approach might exclude call eventsmade earlier in the sequence whose results are used later forevasion. To mitigate this, we include all call events that arerelated to comparison events in Q into the evasion signature.

Page 7: MalGene: Automatic Extraction of Malware Analysis Evasion ... · up emulated and virtualized sandboxes resistant to evolv-ing evasion techniques is a current industry challenge. To

Notice that, unlike the previous sequence alignment step,in this step a call event includes both system calls and APIcalls. Although many user API calls correspond to systemcalls, many user mode APIs may not trigger any system call.For example, the user mode API GetTickCount in Windowsdoes not invoke any system call (native API). However, thisAPI is widely used in timing-based evasions. We must in-clude such call events in the evasion signature to make itmore accurate and complete.

Initially, we set P as the set of all call events in the evasion

section E′

, and Q as the set of all comparison events in E′

.However, even with smaller values of ω, the evasion section

E′

still contains unrelated call events. In the next section,we describe our approach to filtering out these unrelatedevents using statistical observations.

4.2 Inverse Document FrequencyA call event used to retrieve information from the analysis

environment for fingerprinting is usually unique to the eva-sive behavior. The majority of the malware samples that arenot evasive do not retrieve those unique pieces of informa-tion. Similarly, if the same call event e (same name(e) andattrib(e)) is present in the call sequences of all non-evasivemalware, such call event is less likely to be used for eva-sion. We can filter out call events from the evasion sectionE

that occur too often in the collection of call sequencesof non-evasive malware. To perform such filtering, we useinverse document frequency-based metric.

Inverse document frequency (idf ) is commonly used in in-formation retrieval [31]. It is a measure of whether a term iscommon or rare across all documents. Formally, the inversedocument frequency of a term t in a collection of documentsD is defined as:

idf(t,D) = logN

dft

where, N is the total number of documents in the corpusand dft is the document frequency, defined as the number ofdocuments in the collection D that contain the term t.

In our case, a call event is a term, and collection of callsequences of non-evasive malware is the document corpus D.For a call event e, a large value of idf(e,D) implies that thecall event e is unique, and a small value of idf(e,D) impliesthat e is commonplace. Here, idf(e,D) = 0 means the callevent e is present in all call sequences of D.

We define a threshold τ such that, if idf(e,D) < τ , weconsider the call event e to be a common event having littleor no discriminating power for building evasion signatures.We remove such call events {e : idf(e,D) < τ} from P .

4.3 Event Dependency AnalysisThe next component of the evasion signature is the com-

parison events Q used for altering control flow during eva-sion. Comparison events can be monitored with any fine-grained instruction-level execution monitoring. However, weare interested in only those comparisons that involve the useof information generated by previous call events. To trackthe information returned by call events we leverage taintanalysis. To this end, we build upon the work of the Anubisextension proposed in [5]. Anubis [1] is a malware analysisframework, which uses Qemu-based full-system emulation asthe execution environment. In this approach, informationreturned by all call events is tainted at the byte level. Inside

Qemu intermediate language, all comparison instructions ofx86 architecture are translated into the same intermediatecomparison instruction. For each comparison, taint labels ofthe operands are examined to determine corresponding callevents that produced the data byte. Consecutive compar-isons are merged into a single comparison event. In case thecomparison in performed with some constant, the constantvalue is also extracted.

Beside taint analysis, we also analyze handle dependen-cies between call events. This allows us to generate a moredescriptive value of attrib(e) for the call event e. For ex-ample, if a registry key HKLM/System is opened by a call toNtOpenKey and the returned handle is later used for a callto NtEnumerateKey, we use the registry key name as theattrib(e) for the call event NtEnumerateKey.An execution of a program contains many comparisons

even if only comparisons with tainted operands are consid-ered. However, many of such comparisons originate fromwithin API functions rather than the actual malware code.For this reason, comparisons inside user API calls are dis-carded, except for API calls that are designed specificallyfor data type comparison, such as strings and dates. Com-parison events are included in the execution profile of themalware along with the system call and user API call events.

We build the sequence of malware execution events Efrom the execution profile generated by Anubis. We alsoextract the system call sequence from another execution en-vironment that the malware evaded. Since the evasion codeexecutes in both environments, we can extract its evasionsignature from Anubis execution profile regardless of whichenvironment is evaded. We identify the evasion point and

evasion section E′

using the approach described in the pre-vious sections. We extract the call events P and the com-parison events Q from the evasion section E

. We filter Pusing the idf -based method described previously.

Finally, all call events associated with Q are added to theset P . The union of P and Q represents our final evasionsignature ∆ = P ∪Q.

4.4 ClusteringGiven a collection of evasive samples, we propose to assess

different evasion techniques present in the collection basedon the extracted evasion signatures. To do this, we per-form hierarchical clustering of evasive samples. This allowsa malware analyst to prioritize and selectively study differ-ent evasion techniques without analyzing randomly selectedsamples. To perform manual assessment of a particular clus-ter, we can take an intersection of the evasion signatures ofall samples from that cluster. That is, we inspect the eva-sion signature elements that are common to all samples inthe cluster.

A hierarchical clustering requires a method to computepairwise similarity between two evasion signatures. An eva-sion signature is essentially a set. We compute similaritybetween two evasion signatures ∆a and ∆b as a JaccardSimilarity J , which is given as:

J(∆a,∆b) =| ∆a ∩∆b |

| ∆a ∪∆b |.

The result of a hierarchical clustering depends on thechoice of the linkage method and the similarity measure,where, the former is usually more critical than the latter [32].There are two main choices of linkage methods; single-linkage

Page 8: MalGene: Automatic Extraction of Malware Analysis Evasion ... · up emulated and virtualized sandboxes resistant to evolv-ing evasion techniques is a current industry challenge. To

and complete-linkage. We use the complete-linkage methodfor our clustering. This is because the complete-linkagemethod prefers compact clusters with small diameters overlong, straggly clusters [21]. As we want maximum similaritybetween all pairs of members in a cluster for assessment, thecomplete-linkage method best fits our purpose.

5. EVALUATIONWe evaluated our approach on real-world Windows-based

evasive malware samples. We made this choice because themajority of the evasive malware is observed on this platform.Moreover, the majority of the malware analysis systems arealso focused on the same platform.

5.1 Execution EnvironmentsIn our evaluation, we provide two execution environments

based on emulation and hardware virtualization, respectively.

5.1.1 Emulation

We use Anubis [1] to extract malware execution eventsfrom an emulated environment. Anubis performs executionmonitoring by observing an execution of precomputed guestmemory addresses. These memory addresses correspond tosystem call functions and user API functions. Anubis is ableto extract additional information about the API executionby inserting its own instructions to the emulator’s instruc-tion execution chain. Besides system calls, we are able toextract additional information, such as user API calls andcomparison events, which are necessary for building evasionsignatures.

5.1.2 Hypervisor

We use Ether [10] to extract malware execution eventsfrom a hardware-based virtualized environment. Ether is aXen-hypervisor-based transparent malware analysis frame-work that utilizes Intel VT’s hardware virtualization exten-sions [2]. The hardware virtualization makes it possible toexecute most of the malware instructions as native CPU in-structions on the real hardware without any interception.Thus, it does not suffer from inaccurate or incomplete sys-tem emulation. It was observed that Ether can be evadedin its default setup because it uses QEMU’s device model toprovide virtualized hardware peripherals [17]. We modifiedthe device model used by Ether to prevent such evasion.

5.2 DatasetThe input for our system is a collection of known evasive

malware samples. For the evaluation of our system, we re-ceived 3,107 evasive samples identified by the BareCloud [17]system. We analyzed those samples in Anubis and our mod-ified Ether environments. We extracted system call tracesand computed behavior deviation scores as proposed in [17].We found that 2810 samples evaded Anubis with respect toour Ether environment.

To build the ground truth dataset, we randomly selected52 samples out of 2810 evasive samples. We manually an-alyzed those samples and identified the calls and the com-parisons that are related to the evasion. This informationconstitutes the evasion signature ∆ of the malware sam-ples. To evaluate the alignment algorithm, which works onlyon the system call sequences, we identified the most impor-tant system call that is critical to the evasion technique asthe evasion call. In case multiple related system calls are

used, we selected the last system call as the evasion call.For instance, let us take an example evasion instance thatopens a registry key HKLM/HARDWARE/Description/System

using NtOpenKey and reads the value of the key System-

BiosVersion using NtQueryValueKey. Inside Anubis, thereturned value is QEMU -1 because of the underlying Qemusubsystem, which can be checked for evasion. In this exam-ple, both system calls are related to evasion. However, weselect the last call to NtQueryValueKey as the evasion call.We note the index of this instance of the system call in thesequence as the data point used later in the experiments.

5.3 Algorithm EvaluationIn our approach, accurately finding the evasion point is

the first and critical step towards extracting evasion signa-tures. This depends on the accuracy of the proposed se-quence alignment algorithm for system calls. The accuracydepends on several parameters used by the algorithm. InSection 3.2, we discussed some guidelines for choosing opti-mal parameters for algorithm. However, there is no previouswork on this area. Unlike in the field of bioinformatics, anappropriate labeled dataset is lacking to build a statisticalmodel of similarity score for system call sequences. We usean incremental approximation-based approach to find opti-mal values of the parameters, which we describe in the nextsection.

5.3.1 Experiment with Scoring Function

To evaluate our guideline, we performed several experi-ments by varying different scoring parameters. For this, wefirst chose to vary a set of four main parameters (ga, gb, nwt,wt, see Section 3.2). Our preliminary experiments showedthat the values of these parameters play a major role in thealgorithm output. For the remaining parameters, we empir-ically assigned constant values. For each set of parametervalues (ga, gb, nwt, wt) we performed sequence alignment tofind the corresponding evasion point. Let Am be a sequencecorresponding to a malwarem and let km be the index to the

calculated evasion point. Let, e′

m be the index to the eva-sion call in Am, which is known as the ground truth. We

say that an evasion section of width w′

successfully captures

the evasion call if km −w′

≤ e′

m < km. That is, the evasion

call is within the evasion section defined by w′

. For a set ofN samples, we compute the recall rate corresponding to the

parameter set (ga, gb, nwt, wt) and evasion section w′

asTP/N , where TP is the number of samples that are within

the evasion section defined by w′

.Figure 2 shows the results of the recall rate of some param-

eter sets when varying the evasion section of width w′

. Weused the ground truth dataset as described in Section 5.2.The area under the curve (AUC) represents the relative per-formance of the choice of the parameters. The result vali-dates some of our initial intuitions. For example, choice of| ga |>| nwt | decrease the algorithm performance (comparetop and second curves), a relatively large score for a matchcompared to the gap penalty degrades performance (thirdcurve), and a large gap extension penalty gb is not favorable(top and bottom curves).

There are many possible combinations of parameter choices.To find the optimal choice, we computed AUC values for allpossible combinations when ga, gb, nwt, and wt are selectedfrom the sets 10 values for each parameter ranging from -10

Page 9: MalGene: Automatic Extraction of Malware Analysis Evasion ... · up emulated and virtualized sandboxes resistant to evolv-ing evasion techniques is a current industry challenge. To

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

Evasion Section Width (ω)

Rec

all

Rat

e

ga = −2, gb = −0.10, nwt = −2, wt = 3

ga = −10, gb = −0.01, nwt = −2, wt = 3

ga = −1, gb = −0.10, nwt = −2, wt = 10

ga = −2, gb = −0.50, nwt = −2, wt = 3

Figure 2: Parameter tuning experiment. (wa = 2and nwa = -2).

to +10. That is, for each malware sample there are 10,000test cases. To test the correctness of our guidelines, we alsoincluded values that do not satisfy the inequality guideline.Figure 3 shows the result of the AUC values for all parame-ter combinations. The combination of parameters ga = −2,gb = −0.10, nwt = −2, wt = 3 produces the highest valueof AUC (=85.214).

In the next step, we performed another set of similar ex-periments by varying the values of other parameters whilekeeping the values of ga, gb, nwt, and wt set to the optimalvalues obtained from the previous experiments. Namely, weobtained optimal values for wa = 2, nwa = −2, wb = 20,and deduplication threshold td = 0.02.

5.3.2 Comparison with LCS

In this experiment, we compared our sequence alignmentalgorithm with the standard diff algorithm used in Unix diffutility [23]. We computed the corresponding evasion pointusing both algorithms and compared their performances bycomputing their recall rates when varying w. The result ofthis experiment, shown in Figure 4, clearly shows that ourproposed alignment algorithm out performs the LCS-basedalgorithm. This also shows that the LCS-based approachis weak. More than half of the time, the evasion locationsidentified using the LCS-based approach were incorrect.

In this result, we can see that with ω > 83 we achieved100% recall rate. That is, all evasion calls of the groundtruth dataset are captured when ω > 83. We selected amore conservative value of ω = 100 for our next signatureextraction experiments.

5.3.3 Evasion Signature Extraction

The next step in the extraction of evasion signatures isto build the idf -based filter as described in Section 4.2. Forthis, we obtained 119 non-evasive malware samples from theBareCloud system [17]. These are the samples that did not

65

70

75

80

85

Parameter Combinations (ga,gb,nwt,wt)

AU

C

(−10,−0.10,−5,1) (−10,0.00,−1,5) (−5,0.00,−1,1) (−4,0.00,−2,5) (−2,0.00,−2,1) (−1,0.00,−5,5)

Figure 3: Average AUC values of recall rates corre-sponding to 10,000 parameter combinations.

exhibit evasive behavior in bare metal, Anubis, Ether, andVirtualBox analysis environments. This dataset representsthe non-evasive malware sample set D as described in Sec-tion 4.2. We analyzed these samples in the Anubis environ-ment and extracted system calls, user API calls, and com-parisons along with the taint dependency information. Fromthose extracted events, we calculated the idf values for allobserved events.

In the next step, we find the optimal idf -based filter thresh-old τ for filtering out the common execution events. Asdescribed in Section 4.2, we filter out a call event e, ifidf(e,D) < τ . We want a larger value of τ because wewant to filter out as many common events as possible. Avalue of τ too small may not filter anything, and a valuetoo large may filter out events that are part of the evasionsignature. To find the optimal value for τ , we first extractedmultiple evasion signatures of the ground truth samples bysetting different values of τ . To compare the quality of theextracted signatures, we performed a precision-recall analy-

sis of the extracted evasion signatures. Let ∆′

be the auto-matically extracted signature and let ∆ be the true evasionsignature, which is available from the ground truth samples;the precision and recall of the evasion signature extractionis given as:

precision = ∆∩∆′

∆′ , recall = ∆∩∆

∆.

Figure 5 shows the results of the precision and recall anal-ysis. The curves represent the average characteristics of allsamples. We can see that the precision decreases and recallincreases as we lower the value of τ . Smaller values of τmake the idf -based filter weaker, and, hence, the signatureincludes a lot of common events, lowering its precision. Wecan see that the idf -based filter significantly increases thequality of the extracted evasion signatures if τ is selectedoptimally. Precision and recall rate at the crossover point,where the precision and recall curves meet, is 0.83 and thevalue of the threshold at this point is τ = 2.75. That is,

Page 10: MalGene: Automatic Extraction of Malware Analysis Evasion ... · up emulated and virtualized sandboxes resistant to evolv-ing evasion techniques is a current industry challenge. To

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

Evasion Section Width (ω)

Rec

all

Rat

e

MalGene

LCS

Figure 4: Comparison of evasion gap detection.

at τ = 2.75, the algorithm is able to extract 83% of the el-ements of true evasion signatures with a precision of 83%.We use this value for our next experiment on cluster analy-sis. Figure 6 shows a sample evasion signature automaticallyextracted by our system from a malware sample.

0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

idf threshod(τ)

Pre

cisi

on a

nd R

ecal

l R

ate

Precision

Recall

Figure 5: Precision recall analysis of the idf thresh-old τ .

5.4 Evaluation on Real-world SamplesIn this experiment, we applied our approach to 2810 real-

world evasive malware samples. We extracted the corre-sponding evasion signatures and performed hierarchical clus-tering as described in Section 4.4. Figure 7 shows the graph-ical representation of the clusters. Here, the smallest rect-angles represent samples, larger rectangular patches of color

Table 1: The summary of top five clusters.

Cluster Count Evasion signature summary

c6 898 Exception-based emulation detectionc4 582 Cumulative timing of system callsc5 225 Timing of exception processingc8 172 SystemMetrics-based fingerprintingc18 106 Variant of exception-based detection

represent clusters, and the shades of the color inside thepatch represents the degree of similarity among individualsamples within the cluster. To generate unique clusters, wecut the corresponding dendrogram close to the root of thetree. This way, the clusters formed are very distinct fromeach other, representing distinct evasion techniques. A cutat the height h = 0.99 produced 78 clusters. Bold lines inFigure 7 separate these clusters. A cut at the height h = 0produced 1051 clusters. This represents the number of au-tomatically extracted identical evasion signatures.

NtOpenKey, HKLM/System/ControlSet001/Services/Disk/EnumNtQueryValueKey, HKLM/System/ControlSet001/Services/Disk/Enum->0CMP, NtQueryValueKey.KeyValueInformation->’Z’CMP, NtQueryValueKey.KeyValueInformation->’wmwavboxqemu’CMP, NtQueryValueKey.KeyValueInformation->’qemu’

Figure 6: A sample of an automatically extractedevasion signature (8964683b959a9256c1d35d9a6f9aa4ef ).

We manually analyzed few samples from the top five clus-ters. Table 1 presents a summary of the findings.

Figure 7: Hierarchical clustering of evasive malwarebased on their evasion signature.

6. LIMITATIONSThe main limitation of our approach is the requirement of

system call sequences from both analysis environments. Thislimitation prevents us from using pure bare-metal execution-based malware profiles that lack system call monitoring.

Page 11: MalGene: Automatic Extraction of Malware Analysis Evasion ... · up emulated and virtualized sandboxes resistant to evolv-ing evasion techniques is a current industry challenge. To

One of the ways to achieve such monitoring is to use SMM-based monitoring systems, such as MALT [35].

Potentially, malware can have multiple evasion points.That is, a malware can perform multiple evasion checks atdifferent sections of the system call sequence. Since we lo-cate the deviation by finding the largest gap in the alignedsequence, our approach only finds one evasion point, and, asa result, extracts only one evasion signature correspondingto that evasion point. Instead of only using the largest gap,we could consider other large gaps in the aligned sequence, ifany, to identify multiple evasion points. This work is limitedto one evasion signature per evasive malware sample.

An adversary with the knowledge of our system can de-velop a mimicry attack to foil evasion signature extraction.For example, an attacker can use a Pseudo Random Num-ber Generator (PRNG) to introduce artificial large gaps withrandom system calls to evade evasion point detection. How-ever, since the malware sample evades one of the analysissystems, the actual malicious activity also causes anotherlarge gap in the sequence alignment. In this case, we couldsupport multiple evasion points and extract separate eva-sion signatures from the respective evasion points. However,false positive evasion signatures, such as the one that usesPRNG techniques need to be manually identified and fil-tered out. The clustering of evasion signatures as describedin Section 4.4 can help improve this manual process.

One of the common limitations inherent to all dynamicanalysis system is the use of stalling code. A malware sam-ple can wait for a long time before performing any maliciousactivity. Kolbitsch et al., have proposed a technique to de-tect and mitigate malware stalling code [18]. Our currentsystem will not be able extract signatures for such evasionsif the stalling part of the code is deterministic, producingthe same call sequence.

If a malware sample has a high level of randomization inthe code execution, our approach to system call alignmentmay not be effective. However, if the malicious activity islong enough in one of the analysis environments, the align-ment algorithm may provide an approximate location of theevasion, which can help malware analyst in further analysis.Another approach is to analyze the same malware samplemultiple times in the same environment to detect and nor-malize such inherent randomization [20].

The proposed approach of handling sequence branchingmay not be effective for system call sequences produced bythread pools. This is because the order in which a threadin the thread pool is scheduled to handle callbacks can bedifferent among instances of the malware executions.

7. RELATED WORK

7.1 Sequence AlignmentThe sequence alignment problem is widely studied in bioin-

formatics [11, 14, 27]. Our work is based on the algorithmsproposed for biological sequence alignment. We extendedthe algorithms to handle sequence with branches. Addition-ally, we adapted the algorithm and proposed optimal pa-rameters in the context of system call sequence alignment.

Sequence alignment techniques are previously used in mal-ware detection for finding common subsequences as signa-tures and for pattern matching [8,19,34]. Eskin [12] proposesa sparse sequence model to find outliers in the sequences foranomaly detection. Our use of the sequence alignment is

orthogonal to those works. MalGene uses sequence align-ment for identifying deviations between sequences ratherthan finding common patterns as signatures. Furthermore,our algorithm performs deduplication, difference pruning,and can handle branched sequences. Our approach to theextraction of evasion signature leverages data-flow depen-dencies to extract relevant but potentially distant events inthe sequence. Data-mining techniques are used to discardirrelevant events from the evasion signature.

7.2 Differential Program AnalysisThe problem of analyzing the differences between two runs

of a program has been previously studied [15, 33]. Thework most similar to ours is the approach of differentialslicing [15]. Given a pair of two execution traces of thesame program and a location of observed difference in thetrace, differential slicing can identify the input differencethat caused the observed difference. The main differencewith our work is that the differential slicing approach re-quires fine-grained analysis on both analysis environments.This may not be always available in all malware analysisenvironments. Our approach does not require fine-grain in-struction level monitoring from both analysis environments.Furthermore, to find the source of the execution difference,an analyst must first manually identify the location of theobserved difference before applying the differential slicinganalysis. We automate the process of identifying the loca-tion of the difference. Therefore, while previous work on dif-ferential slicing is suitable for more focused individual anal-ysis, our approach is designed to provided an automated andpractical solution to approximate program difference analy-sis on large scale.

7.3 Evasion DetectionChen et al. proposed a detailed taxonomy of evasion tech-

niques used by malware against dynamic analysis system [6].Lau et al. employed a dynamic-static tracing technique toidentify VM detection techniques. Kang et al. [16] proposeda scalable trace-matching algorithm to locate the point ofexecution diversion between two executions. The systemis able to dynamically modify the execution of the whole-system emulator to defeat anti-emulation checks. Balzarottiet al. [4] proposed a system for detecting dynamic behaviordeviation of malware by comparing behaviors between aninstrumented environment and a reference host. The com-parison method is based on deterministic program executionreplay. That is, the malware under analysis is first executedin a reference host while recording the interaction of themalware with the operating system. Later, the execution isreplayed deterministically in an analysis environment suchthat any deviation in the execution is an evidence of anevasion. Determinsitc replay of a malware sample may bechallenging if it depends on the external network environ-ment. In our approach, a malware can be simultaneouslyexecuted in two environments and analyzed later.

8. CONCLUSIONIn this paper, we presented MalGene, a system for au-

tomatically extracting evasion signatures from evasive mal-ware. We propose a combination of bioinformatic algo-rithms, data mining, and data flow analysis techniques toautomate the signature extraction process, so that it can beapplied to a large-scale dataset.

Page 12: MalGene: Automatic Extraction of Malware Analysis Evasion ... · up emulated and virtualized sandboxes resistant to evolv-ing evasion techniques is a current industry challenge. To

9. ACKNOWLEDGMENTSWe want to thank our shepherd Konrad Rieck and the

anonymous reviewers for their valuable comments, and Christo-pher Kruegel for his insight and discussions throughout thisproject.

This work is sponsored by the Defense Advanced ResearchProjects Agency (DARPA) under grant N66001-13-2-4039and by the Army Research Office (ARO) under grant W911NF-09-1-0553. The U.S. Government is authorized to reproduceand distribute reprints for Governmental purposes notwith-standing any copyright notation thereon.

10. REFERENCES

[1] Anubis. http://anubis.cs.ucsb.edu.[2] Intel Virtualization Technology.

http://www.intel.com/technology/virtualization/.[3] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang,

Z. Zhang, W. Miller, and D. J. Lipman. Gapped blast andpsi-blast: a new generation of protein database searchprograms. Nucleic Acids Research, 1997.

[4] D. Balzarotti, M. Cova, C. Karlberger, C. Kruegel,E. Kirda, G. Vigna, and S. Antipolis. Efficient Detection ofSplit Personalities in Malware. In Symposium on Networkand Distributed System Security (NDSS), 2010.

[5] U. Bayer, P. M. Comparetti, C. Hlauschek, C. Kruegel, andE. Kirda. Scalable, behavior-based malware clustering. InSymposium on Network and Distributed System Security(NDSS), 2009.

[6] X. Chen, J. Andersen, Z. M. Mao, M. Bailey, andJ. Nazario. Towards an Understanding ofAnti-Virtualization and Anti-Debugging Behavior inModern Malware. In Dependable Systems and NetworksWith FTCS and DCC, 2008.

[7] P. M. Comparetti, G. Salvaneschi, E. Kirda, C. Kolbitsch,C. Kruegel, and S. Zanero. Identifying DormantFunctionality in Malware Programs. In IEEE Symposiumon Security and Privacy, 2010.

[8] S. E. Coull and B. K. Szymanski. Sequence alignment formasquerade detection. Computational Statistics & DataAnalysis, 52(8):4116–4131, 2008.

[9] M. O. Dayhoff and R. M. Schwartz. A model ofevolutionary change in proteins. In Atlas of ProteinSequence and Structure, 1978.

[10] A. Dinaburg, P. Royal, M. Sharif, and W. Lee. Ether:Malware Analysis via Hardware Virtualization Extensions.In ACM Conference on Computer and CommunicationsSecurity (CCS), 2008.

[11] R. C. Edgar. Muscle: multiple sequence alignment withhigh accuracy and high throughput. Nucleic acids research,2004.

[12] E. Eskin. Sparse sequence modeling with applications tocomputational biology and intrusion detection. PhD thesis,2002.

[13] P. Ferrie. Attacks on virtual machine emulators. Technicalreport, Symantec Corporation, 2007.

[14] O. Gotoh. An improved algorithm for matching biologicalsequences. Journal of molecular biology.

[15] N. M. Johnson, J. Caballero, K. Z. Chen, S. McCamant,P. Poosankam, D. Reynaud, and D. Song. Differentialslicing: Identifying causal execution differences for securityapplications. In IEEE Symposium on Security and Privacy,2011.

[16] M. Kang, H. Yin, and S. Hanna. Emulatingemulation-resistant malware. ACM workshop on Virtualmachine security. ACM, 2009.

[17] D. Kirat, G. Vigna, and C. Kruegel. BareCloud: bare-metalanalysis-based evasive malware detection. In USENIXSecurity Symposium (USENIX), 2014.

[18] C. Kolbitsch, E. Kirda, and C. Kruegel. The Power ofProcrastination: Detection and Mitigation ofExecution-stalling Malicious Code. In ACM Conference onComputer and Communications Security (CCS), 2011.

[19] V. Kumar, S. K. Mishra, and L. Bhopal. Detection ofmalware by using sequence alignment strategy and datamining techniques. International Journal of ComputerApplications, 62(22), 2013.

[20] M. Lindorfer, C. Kolbitsch, and P. M. Comparetti.Detecting Environment-Sensitive Malware. Symposium onRecent Advances in Intrusion Detection (RAID), pages338–357, 2011.

[21] C. D. Manning, P. Raghavan, and H. Schutze. Introductionto information retrieval. 2008.

[22] D. W. Mount. Sequence and genome analysis.Bioinformatics: Cold Spring Harbour Laboratory Press:Cold Spring Harbour, 2, 2004.

[23] E. W. Myers. Ano (nd) difference algorithm and itsvariations. Algorithmica, 1986.

[24] S. B. Needleman and C. D. Wunsch. A general methodapplicable to the search for similarities in the amino acidsequence of two proteins. Journal of molecular biology,1970.

[25] R. Paleari, L. Martignoni, G. Fresi Roglia, and D. Bruschi.A fistful of red-pills: How to automatically generateprocedures to detect CPU emulators. In USENIXWorkshop on Offensive Technologies (WOOT).

[26] G. Pek. nEther : In-guest Detection of Out-of-the-guestMalware Analyzers. Proceedings of the Fourth EuropeanWorkshop on System Security. ACM, 2011.

[27] V. Polyanovsky, M. A. Roytberg, and V. G. Tumanyan.Comparative analysis of the quality of a global algorithmand a local algorithm for alignment of two sequences.Algorithms for Molecular Biology, 2011.

[28] T. Raffetseder, C. Kruegel, and E. Kirda. Detecting SystemEmulators. Information Security, pages 1–18, 2007.

[29] J. Rutkowska. Red pill... or how to detect vmm using(almost) one cpu instruction, 2004.

[30] T. F. Smith and M. S. Waterman. Identification of commonmolecular subsequences. Journal of molecular biology, 1981.

[31] K. Sparck Jones. A statistical interpretation of termspecificity and its application in retrieval. Journal ofdocumentation, 1972.

[32] A. J. Vakharia and U. Wemmerlov. A comparativeinvestigation of hierarchical clustering techniques anddissimilarity measures applied to the cell formationproblem. Journal of operations management, 1995.

[33] D. Weeratunge, X. Zhang, W. N. Sumner, andS. Jagannathan. Analyzing concurrency bugs using dualslicing. In Symposium on Software Testing and Analysis,2010.

[34] A. Wespi, M. Dacier, and H. Debar. An intrusion-detectionsystem based on the Teiresias pattern-discovery algorithm.IBM Thomas J. Watson Research Division, 1999.

[35] F. Zhang, K. Leach, A. Stavrou, H. Wang, and K. Sun.Using hardware features for increased debuggingtransparency. In IEEE Symposium on Security andPrivacy, May 2015.