Click here to load reader

Self-Learning Anti-Virus Scanner

  • Upload
    havily

  • View
    19

  • Download
    0

Embed Size (px)

DESCRIPTION

Self-Learning Anti-Virus Scanner. Arun Lakhotia , Professor Andrew Walenstein , Assistant Professor University of Louisiana at Lafayette www.cacs.louisiana.edu/labs/SRL. Introduction. Alumni in AV Industry Prabhat Singh Nitin Jyoti Aditya Kapoor Rachit Kumar McAfee AVERT - PowerPoint PPT Presentation

Citation preview

Market Case for Program Analysis Technology

Arun Lakhotia, ProfessorAndrew Walenstein, Assistant ProfessorUniversity of Louisiana at Lafayettewww.cacs.louisiana.edu/labs/SRL2008 AVAR (New Delhi)1Self-Learning Anti-Virus Scanner

IntroductionAVAR 2008 (New Delhi)2Director, Software Research Lab

Labs focus: Malware Analysis

Graduate level course on Malware Analysis

Six years of AV related research

Issues investigated: Metamorphism Obfuscation Alumni in AV IndustryPrabhat SinghNitin JyotiAditya KapoorRachit Kumar McAfee AVERTErik Uday Kumar, AuthentiumMoinuddin Mohammed,MicrosoftPrashant Pathak, Ex-Symantec

Funded by: Louisiana Governors IT Initiative

Outline2008 AVAR (New Delhi)3Attack of VariantsAV vulnerability: Exact matchInformation Retrieval TechniquesInexact matchAdapting IR to AVAccount for code permutationVilo: System using IR for AVIntegrating Vilo into AV InfrastructureSelf-Learning AV using Vilo

ATTACK OF VARIANTS2008 AVAR (New Delhi)4Variants vs Family

AVAR 2008 (New Delhi)5Source: Symantec Internet Threat Report, XIAnalysis of attacker strategy2008 AVAR (New Delhi)6Purpose of attack of variantsDenial of Service on AV infrastructureIncrease odds of passing throughWeakness exploitedAV system use: Exact match over extractAttack strategyGenerate just enough variation to beat exact matchAttacker costCost of generating and distributing variantsAnalyzing attacker cost2008 AVAR (New Delhi)7Payload creation is expensiveMust reuse payloadNeed thousands of variantsMust be automatedGeneral transformers are expensiveSpecialized, limited transformersHence packers/unpackers

Attacker vulnerability2008 AVAR (New Delhi)8Automated transformersLimited capabilityMachine generated, must have regular patternExploiting attacker vulnerabilityDetect patterns of similaritiesApproachInformation Retrieval (this presentation)Markov Analysis (other work)Information Retrieval2008 AVAR (New Delhi)9IR Basics2008 AVAR (New Delhi)10Basis of Google, BioinformaticsOrganizing very large corpus of dataKey ideaInexact match over wholeContrast with AVExact match over extractIR ProblemAVAR 2008 (New Delhi)11IRDocument CollectionQuery: Keywords orDocument Related documents

IR StepsAVAR 2008 (New Delhi)12Have you wondered

When is a rose a rose?

Have you wonderedYou wondered whenWondered when roseWhen rose rose Step 1: Convert documents to vectors1a. Define a method to identify featuresExample: k-consecutive words1b. Extract all features from all documents1c. Count features, make feature vector1How about onions Onion smell stinks 11100[1, 1, 1, 1, 0,0]IR StepsAVAR 2008 (New Delhi)13Step 2: Compute feature vectorsTake into account features in entire corpusClassical methodW=TF x IDF

You wondered whenWondered when roseWhen rose rose How about onions Onion smell stinks DF = # documents containing the featureTF = Term Frequency57863DF1/51/71/81/61/3IDFIDF = Inverse of DF12530TF(v1)1/52/75/83/60/3w1 = TFxIDF(v1)IR Steps2008 AVAR (New Delhi)14Step 3: Compare vectorsCosine similarity

w1 = [0.33, =0.25, 0.66, 0.50] w1 = [0.33, =0.25, 0.66, 0.50]

IR StepsAVAR 2008 (New Delhi)15Step 4: Document RankingUsing similarity measure

IRDocument Collection0.900.820.760.30Matching documentNew Document Adapting IR for AVAVAR 2008 (New Delhi)16Adapting IR for AV2008 AVAR (New Delhi)17l2D2: pushecxpush4popecxpushecxl2D7:roledx, 8movdl, alanddl, 3Fhshreax, 6loopl2D7popecxcalls319xchgeax, edxstosdxchgeax, edxinc[ebp+v4]cmp[ebp+v4], 12hjnzshort l305l144: pushecxpush4popecxpushecxl149:movdl, alanddl, 3Fhroledx, 8shrebx, 6loopl149popecxcalls52Fxchgebx, edxstosdxchgebx, edxinc[ebp+v4]cmp[ebp+v4], 12hjnzshort l18

l2D2: pushecxpush4popecxpushecxl2D7:roledx, 8movdl, alanddl, 3Fhshreax, 6loopl2D7popecxcalls319xchgeax, edxstosdxchgeax, edxinc[ebp+v4]cmp[ebp+v4], 12hjnzshort l305

l144: pushecxpush4popecxpushecxl149:movdl, alanddl, 3Fhroledx, 8shrebx, 6loopl149popecxcalls52Fxchgebx, edxstosdxchgebx, edxinc[ebp+v4]cmp[ebp+v4], 12hjnzshort l18

pushpushpoppushrolmovandshrlooppopcallxchgstosdxchginccmpjnzpushpushpoppushmovandrolshrlooppopcallxchgstosdxchginccmpjnzStep 0: Mapping program to documentExtract Sequence of operationsAdapting IR for AV2008 AVAR (New Delhi)18Step 1a: Defining features k-permPPOPRMASLOCXSXICJPPOPMARSLOCXSXICJP P O P R M A S L O C X S X I C JP P O P S L O C X S X I C JRM AVirus 1Virus 2Feature = Permutation of k operationsAdapting IR for AVAVAR 2008 (New Delhi)19P P O P R M A S L O C X S X I C JPP O PI C JO C X S XM A R S L PP O PI C JO C X S XM A R S L PP O PI C JO C X S XM A R S L P O PVirus 1Virus 2Virus 3Step 1 Example of 3-permAdapting IR for AVAVAR 2008 (New Delhi)20POPROPRMPRMARMASMASLPOPMOPMAARSLRSLPSLPOLPOP111111000000200011100030000001111111PMARMARS000000 P O P R M A S L P O P M A R S L 123 M A R S L P O P PMARMARSStep 2: Construct feature vectors (4-perms)Adapting IR for AVAVAR 2008 (New Delhi)21Step 3: Compare vectorsCosine similarity (as before)

Step 4: Match new sample

Vilo: System using IR for AVAVAR 2008 (New Delhi)22Vilo Functional ViewAVAR 2008 (New Delhi)23ViloMalware Collection0.900.820.760.30Malware MatchNew SampleVilo in Action: Query MatchAVAR 2008 (New Delhi)24

Vilo: PerformanceAVAR 2008 (New Delhi)25

Response time vs Database sizeSearch on generic desktop: In SecondsContrast withBehavior match: In MinutesGraph match: In MinutesVilo Match AccuracyAVAR 2008 (New Delhi)26

ROC Curve: True Positive vs False Positive False PositiveTrue PositiveVilo in AV ProductAVAR 2008 (New Delhi)27Vilo in AV ProductAVAR 2008 (New Delhi)28AV Scanner

Classifier

Classifier

Classifier

Vilo

Classifier

Classifier

AV Systems: Composed of classifiersIntroduce Vilo as a ClassifierSelf-Learning AV ProductAVAR 2008 (New Delhi)29Vilo

Classifier

Classifier

How to get malware collection?Collect malware detected by the Product.Solution 1Self-Learning AV ProductAVAR 2008 (New Delhi)30Vilo

Classifier

Classifier

Internet CloudVilo

How to get malware collection?Collect and learn in the cloudSolution 2Learning in the CloudAVAR 2008 (New Delhi)31Vilo Classifier

Classifier

Classifier

Internet CloudVilo Learner

How to get malware collection?Collect and learn in the cloudSolution 2Experience with Vilo-LearningAVAR 2008 (New Delhi)32Vilo-in-the-cloud holds promiseCan utilize cluster of workstationsLike GoogleTake advantage of increasing bandwidth and compute powerEngineering issues to addressControl growth of databaseForget samplesUse signature feature vector(s) for familyBe selective about features to useSummaryAVAR 2008 (New Delhi)33Weakness of current AV systemExact match over extractExploited by creating large number of variantsInformation Retrieval research strengthsInexact match over wholeVILO demonstrates IR techniques have promiseArchitecture of Self-Learning AV SystemIntegrate VILO into existing AV systemsCreate feedback mechanism to drive learning