Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Mark VellaWayne Grixti Sergio Pisani
Panel discussion by:
#AImt
Contents• A data-centric paradigm
• Concerns and challenges
• Governance
A data-centric paradigm
Artificial Intelligence (AI)• Loosely defined:
– Algorithmic solutions to complex problems typically solved by humans
– Characteristics• Learning, autonomy, adapt to their environment
• Machine Learning– Main contributor to AI’s recent achievements– Been around for quite a while in the form of
• Statistical analysis• Pattern recognition
– Advances in technology served as a catalyst
• The role of the Data Scientist tool center stage
Data Science for Cyber Security
ML
Dataset
Data stream …Classification/RegressionProgram
SpamOr Ham?
Ransomware, Info-stealer Or AdwareOr Benign?
PredictedCPU util. %- anomalous?
Understand/Visualize/Pre-process
Select/configure/evaluate
Compute/Searchan optimized…
Un/Semi/SupervisedMath/Instance-based/Deep models
Spam Dataset
Network Traffic Analysis Dataset
Malware Dataset
Feature Engineering i• Dataset -> Feature Vectors• Spam: Bag of words<'academic', 'academy', 'acatihdlihdpbgwgcmvmdw5kihlvdxigywnjb3vudc4gd2l0a..<0, 16, 32, 8, 0, 0, 0, …..>
• Network traffic analysis: Connection statistics@attribute 'duration' real’@attribute 'protocol_type' {'tcp','udp', 'icmp'} @attribute 'service' {'aol', 'auth', 'bgp', 'courier', .. @attribute 'flag' { 'OTH', 'REJ', 'RSTO', 'RSTOS0', 'RSTR', 'S0', 'S1', 'S2', 'S3', 'SF', 'SH' }@attribute 'src_bytes' real@attribute 'dst_bytes' real<0,tcp,ftp_data,SF,491,0,0,.... >
Feature Engineering ii• Malware: Instruction counts<and,lea,xor,sub,jmp,mov,pop,test,add,call,ret,jne,push,je,inc,shl,or,cmp …. ><195,348,354,177,280,2451,600,309,392,529,22,248,1620,388,214,80,98,514 … >
• Taking into account– Biased datasets– Missing data– Inaccurate labels– Insufficient, non-representative data– Concept drift– Attributes with a different scale– Categorical or sparse values– Etc ...
Models as decision boundaries
Learning to predict ...
… using an optimization procedure
OK not the Robots Revolt• yet!
• But numerous challenges abound
Concerns and Challenges
The application of partly autonomous
algorithms in cybersecurity is not entirely
new, although traditionally those systems were
usually not referred to as ‘Artificial Intelligence’.
Cybersecurity controls capable of functioning autonomously
and taking intelligent decisions to protect information
systems and services have existed for quite some time for
instance for deciding whether or not to allow a certain network
communication, to autonomously filter spam messages, or
adapt to new circumstances such as the identification of
previously unseen forms of cyber-attacks.
Since then the field of Cybersecurity hasundergone rapid transformation due to thedevelopments in Machine Language (ML), deep(DL), and reinforcement learning (RL), which haveresulted in notable successes in addressingcomputer vision, Natural Language Processing
(NLP) and autonomous decision-making.
ChallengesClear abuse of AI systems to enhance cyber-
attacks and malicious use of the technology
resulting in an increased and more extended
cyber attack vector.
• Increased difficulties to attribute attacks to
specific actor.
• The targeting of human vulnerabilities through
autonomous social engineering, social media
and propaganda manipulation.
• Attacks on cyber-physical systems such as
autonomous vehicles, or the development of
autonomous weapon systems.
• Deep fakes – Forgery of voice and images
• DDos packets created mimicking user activity
• machine learning-based phishing attack
generator that is recorded to have increased the
penetration success from 0.3% to 15%
• Security mechanisms such as such as CAPTCHA
are being bypassed
• Building smarter password guessers through
deep learning on leaked data sets
• AI especially DL models already outperform
humans in multiple task.
• Introducing empathy as an algorithm
With an increasing number of AI systems
employed in cybersecurity, not only are we
making progress but we are also introducing new
vulnerabilities that then open the window for new
types of attacks
New Vulnerabilities
Threat to machine learning isdata poisoning.
If attackers can figure outhow an algorithm is set up,or where it draws its trainingdata from, they can figureout ways to introducemisleading data that builds acounter-narrative aboutwhat content or traffic islegitimate versus malicious.
Harnessed power - The Tool Box
• Digital Forensics
• Facial Recognition
• Predictive policing
Digital Forensics• Use of biometric systems for investigative
purposes. Ø Analysis of unordered data volumes
Ø Text analysis
Ø Image
Ø Audio
Facial recognitionA much awaited development London MET which is finding
staunch opposition from privacy advocates
notwithstanding that 8 major trial were carried out
since 2016. Results indicate that only 1 in 1000 is
innocently pinged. If we consider the amount of
criminals that will face justice, is this a matter of ‘a
compromise for a greater good’ ?
Predictive applicationsToday’s machines are interpreting data ,
recognising patterns and thus recommending
more efficient ways to achieve desired
outcomes.
At the basis of all we need a strategy that makes
use of AI technology for prevention and
counter-cybercrime in order to stay at par with
criminal trends and predictions
Governance
Malta AI Strategy
• Actions to mitigate cybersecurity risks
• Intersection of AI and cybersecurity
• Framework to create trustworthy AI
MDIA AI Certification Programme
• Platform to practitioners & companies
• Based on Malta Ethical AI Framework
• Valuable recognition in the market
Malta’s AI Ethical Framework
• Governance and control practices
• Assess potential forms of attack
• Risks due to unintentional behaviour
Process Map
Conclusion
• AI will bring a lot of benefits
• Still risks need to be addressed
• Instil trust through tech assurance
Open Discussion
Credits• Chio, C., and Freeman, D. (2018). Machine
Learning and Security: Protecting Systems with Data and Algorithms. O'Reilly Media, Inc.".
• Géron, A. (2017). Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems. " O'Reilly Media, Inc.".
• PoweredTemplate.com
#AImt