Click here to load reader
Upload
havily
View
19
Download
0
Embed Size (px)
DESCRIPTION
Self-Learning Anti-Virus Scanner. Arun Lakhotia , Professor Andrew Walenstein , Assistant Professor University of Louisiana at Lafayette www.cacs.louisiana.edu/labs/SRL. Introduction. Alumni in AV Industry Prabhat Singh Nitin Jyoti Aditya Kapoor Rachit Kumar McAfee AVERT - PowerPoint PPT Presentation
Citation preview
Market Case for Program Analysis Technology
Arun Lakhotia, ProfessorAndrew Walenstein, Assistant ProfessorUniversity of Louisiana at Lafayettewww.cacs.louisiana.edu/labs/SRL2008 AVAR (New Delhi)1Self-Learning Anti-Virus Scanner
IntroductionAVAR 2008 (New Delhi)2Director, Software Research Lab
Labs focus: Malware Analysis
Graduate level course on Malware Analysis
Six years of AV related research
Issues investigated: Metamorphism Obfuscation Alumni in AV IndustryPrabhat SinghNitin JyotiAditya KapoorRachit Kumar McAfee AVERTErik Uday Kumar, AuthentiumMoinuddin Mohammed,MicrosoftPrashant Pathak, Ex-Symantec
Funded by: Louisiana Governors IT Initiative
Outline2008 AVAR (New Delhi)3Attack of VariantsAV vulnerability: Exact matchInformation Retrieval TechniquesInexact matchAdapting IR to AVAccount for code permutationVilo: System using IR for AVIntegrating Vilo into AV InfrastructureSelf-Learning AV using Vilo
ATTACK OF VARIANTS2008 AVAR (New Delhi)4Variants vs Family
AVAR 2008 (New Delhi)5Source: Symantec Internet Threat Report, XIAnalysis of attacker strategy2008 AVAR (New Delhi)6Purpose of attack of variantsDenial of Service on AV infrastructureIncrease odds of passing throughWeakness exploitedAV system use: Exact match over extractAttack strategyGenerate just enough variation to beat exact matchAttacker costCost of generating and distributing variantsAnalyzing attacker cost2008 AVAR (New Delhi)7Payload creation is expensiveMust reuse payloadNeed thousands of variantsMust be automatedGeneral transformers are expensiveSpecialized, limited transformersHence packers/unpackers
Attacker vulnerability2008 AVAR (New Delhi)8Automated transformersLimited capabilityMachine generated, must have regular patternExploiting attacker vulnerabilityDetect patterns of similaritiesApproachInformation Retrieval (this presentation)Markov Analysis (other work)Information Retrieval2008 AVAR (New Delhi)9IR Basics2008 AVAR (New Delhi)10Basis of Google, BioinformaticsOrganizing very large corpus of dataKey ideaInexact match over wholeContrast with AVExact match over extractIR ProblemAVAR 2008 (New Delhi)11IRDocument CollectionQuery: Keywords orDocument Related documents
IR StepsAVAR 2008 (New Delhi)12Have you wondered
When is a rose a rose?
Have you wonderedYou wondered whenWondered when roseWhen rose rose Step 1: Convert documents to vectors1a. Define a method to identify featuresExample: k-consecutive words1b. Extract all features from all documents1c. Count features, make feature vector1How about onions Onion smell stinks 11100[1, 1, 1, 1, 0,0]IR StepsAVAR 2008 (New Delhi)13Step 2: Compute feature vectorsTake into account features in entire corpusClassical methodW=TF x IDF
You wondered whenWondered when roseWhen rose rose How about onions Onion smell stinks DF = # documents containing the featureTF = Term Frequency57863DF1/51/71/81/61/3IDFIDF = Inverse of DF12530TF(v1)1/52/75/83/60/3w1 = TFxIDF(v1)IR Steps2008 AVAR (New Delhi)14Step 3: Compare vectorsCosine similarity
w1 = [0.33, =0.25, 0.66, 0.50] w1 = [0.33, =0.25, 0.66, 0.50]
IR StepsAVAR 2008 (New Delhi)15Step 4: Document RankingUsing similarity measure
IRDocument Collection0.900.820.760.30Matching documentNew Document Adapting IR for AVAVAR 2008 (New Delhi)16Adapting IR for AV2008 AVAR (New Delhi)17l2D2: pushecxpush4popecxpushecxl2D7:roledx, 8movdl, alanddl, 3Fhshreax, 6loopl2D7popecxcalls319xchgeax, edxstosdxchgeax, edxinc[ebp+v4]cmp[ebp+v4], 12hjnzshort l305l144: pushecxpush4popecxpushecxl149:movdl, alanddl, 3Fhroledx, 8shrebx, 6loopl149popecxcalls52Fxchgebx, edxstosdxchgebx, edxinc[ebp+v4]cmp[ebp+v4], 12hjnzshort l18
l2D2: pushecxpush4popecxpushecxl2D7:roledx, 8movdl, alanddl, 3Fhshreax, 6loopl2D7popecxcalls319xchgeax, edxstosdxchgeax, edxinc[ebp+v4]cmp[ebp+v4], 12hjnzshort l305
l144: pushecxpush4popecxpushecxl149:movdl, alanddl, 3Fhroledx, 8shrebx, 6loopl149popecxcalls52Fxchgebx, edxstosdxchgebx, edxinc[ebp+v4]cmp[ebp+v4], 12hjnzshort l18
pushpushpoppushrolmovandshrlooppopcallxchgstosdxchginccmpjnzpushpushpoppushmovandrolshrlooppopcallxchgstosdxchginccmpjnzStep 0: Mapping program to documentExtract Sequence of operationsAdapting IR for AV2008 AVAR (New Delhi)18Step 1a: Defining features k-permPPOPRMASLOCXSXICJPPOPMARSLOCXSXICJP P O P R M A S L O C X S X I C JP P O P S L O C X S X I C JRM AVirus 1Virus 2Feature = Permutation of k operationsAdapting IR for AVAVAR 2008 (New Delhi)19P P O P R M A S L O C X S X I C JPP O PI C JO C X S XM A R S L PP O PI C JO C X S XM A R S L PP O PI C JO C X S XM A R S L P O PVirus 1Virus 2Virus 3Step 1 Example of 3-permAdapting IR for AVAVAR 2008 (New Delhi)20POPROPRMPRMARMASMASLPOPMOPMAARSLRSLPSLPOLPOP111111000000200011100030000001111111PMARMARS000000 P O P R M A S L P O P M A R S L 123 M A R S L P O P PMARMARSStep 2: Construct feature vectors (4-perms)Adapting IR for AVAVAR 2008 (New Delhi)21Step 3: Compare vectorsCosine similarity (as before)
Step 4: Match new sample
Vilo: System using IR for AVAVAR 2008 (New Delhi)22Vilo Functional ViewAVAR 2008 (New Delhi)23ViloMalware Collection0.900.820.760.30Malware MatchNew SampleVilo in Action: Query MatchAVAR 2008 (New Delhi)24
Vilo: PerformanceAVAR 2008 (New Delhi)25
Response time vs Database sizeSearch on generic desktop: In SecondsContrast withBehavior match: In MinutesGraph match: In MinutesVilo Match AccuracyAVAR 2008 (New Delhi)26
ROC Curve: True Positive vs False Positive False PositiveTrue PositiveVilo in AV ProductAVAR 2008 (New Delhi)27Vilo in AV ProductAVAR 2008 (New Delhi)28AV Scanner
Classifier
Classifier
Classifier
Vilo
Classifier
Classifier
AV Systems: Composed of classifiersIntroduce Vilo as a ClassifierSelf-Learning AV ProductAVAR 2008 (New Delhi)29Vilo
Classifier
Classifier
How to get malware collection?Collect malware detected by the Product.Solution 1Self-Learning AV ProductAVAR 2008 (New Delhi)30Vilo
Classifier
Classifier
Internet CloudVilo
How to get malware collection?Collect and learn in the cloudSolution 2Learning in the CloudAVAR 2008 (New Delhi)31Vilo Classifier
Classifier
Classifier
Internet CloudVilo Learner
How to get malware collection?Collect and learn in the cloudSolution 2Experience with Vilo-LearningAVAR 2008 (New Delhi)32Vilo-in-the-cloud holds promiseCan utilize cluster of workstationsLike GoogleTake advantage of increasing bandwidth and compute powerEngineering issues to addressControl growth of databaseForget samplesUse signature feature vector(s) for familyBe selective about features to useSummaryAVAR 2008 (New Delhi)33Weakness of current AV systemExact match over extractExploited by creating large number of variantsInformation Retrieval research strengthsInexact match over wholeVILO demonstrates IR techniques have promiseArchitecture of Self-Learning AV SystemIntegrate VILO into existing AV systemsCreate feedback mechanism to drive learning