View
61
Download
5
Category
Preview:
Citation preview
Malicious Client Detection using machine learning SATYAM SAXENA
Threats•There are many types of malware for all types of devices and operating systems
•Most if not all malware relies on a support system – command and control infrastructure
•Bad guys use DNS to scale and hide their C&C infrastructure
•Bad guys use DNS for C&C to bypass corporate security (tunneling)
•Bad guys use cloud providers to roll out, scale, manage and quickly move their C&C Infrastructure
Without reliance on any particular end point operation system or configuration, we can use big data analytics on network data to detect malware.
Malware use of DNS
rndruppbakyokv[.]com
1.2.3.4
rndruppbakyokv[.]com
1.2.3.4
Command andControl
Infrastructure
CommunicationChanel with C&C is established. Compromised device receives updates, instructions, targets.
DNS Server
DNS Server
End point device
RawpDNS
Domain Nameclassifier
DNS Resolverclassifier
Device Behavior classifier
Compromised Device(Security Event)
classifier
MaliciousDomains
MaliciousResolvers
Behavior Anomalies
Machine Learning Pipeline
DGA Network Time
Tunnel
Network Time
Network Time
Architecture
DGA Model• Detect Randomly generated domains in the pDNS data.
• Model is trained on 6 categories of malware families like zeus, tinba, pushdo, etc.
• 29 features extracted from the domain.
• 29 features dimensionally reduced to 16 features using PCA.
• Those reduced features set is then used to train a GBM classifier.
Domain FeaturesCommon Letter Score Entropy
Domain Features(2)Length of largest meaningful string Mean length of dictionary words
DGA Features
DGA Classification PerformanceOverall model performance
(Random Forrest)
Metric Performance Accuracy 98.738% Precision 99.288% Recall 98.181% AUC 99.801%
Performance per malware family
Malware Family % Detection
Conflicker 86.309%
Cryptolocker 98.348%
Pushdo 95.515%
Ramdo 99.823%
Tinba 96.715%
Zeus 100.0%
Network Model• Using WHOIS record to find if a domain is malicious or benign.
• WHOIS record contains very rich information about a domain.
• Age based features.
• Registration Features.
Network Features – Whois Server
Malicious Domains Benign Domains
Network Features – creation Date
Network Model Performance • Final Set of features :- creation Date, update Date, expiration Date,admin country, registrant country, tech country, status, whois server
Metric Performance Error 0.00450864127
Area Under Curve 0.96615884041
Compromised Client Detection
Hadoop HDFS
Spark Compute
IP DGA WHOIS NX SERVERip1 #10 #3 #4 #5
Ip2 #8 #1 #2 #3
ip3 #5 #2 #0 #0
ip4 #3 #3 #0 #0
pDNS Data
Group By
Thank You
Recommended