BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

Preview:

Citation preview

BE ready. BE safe. BE secure.

Bsides Lisbon 2015 Security data Metrics and measurements at

scale1

binaryedge.io

Who

TIAGO HENRIQUES

• BSc Software Engineering / University of Brighton

• MSc Computer Security and Forensics / University of Bedfordshire

• 8 Years experience in Information Security consultancy, leadership and research

CEO and Founder @ BinaryEdge

TIAGO MARTINS

• BSc and MSc Computer Science / University of Lisbon

• 7 Years experience developing real-time systems and high-volume data processing

CTO and Co-Founder @ BinaryEdge

ROBERTO BARBOSA

• More than 20 years on the IT sector

• ex-Engineer at Sun Microsystems

• Former Philip Morris corporate Auditor

• expert on High Scalability and Availability on the Finance Sector (UBS, Citigroup and Leonteq) and mobile startup.

COO and Head of DataScience at BinaryEdge

2

binaryedge.io

WhereEurope

SWITZERLANDHeadquarter

Market

Management

PortugalWorkforce

United KingdomMarket

Workforce

3

binaryedge.io

What

Machine Learning Data Mining

security

data

data

data

data

4

binaryedge.io

NEW COMMODITY

DATA IS THE NEW OIL

5

binaryedge.io

NEW CURRENCY

6

binaryedge.io

DATA BUSINESS MODEL

Gather and Sell raw Data

collect supply

store host

filter refine

enhance enrich

simplify access

consult advise

Hold onto someone else’s data for them

Strip out problematic records or data fields or release interesting data subsets

Blend in other datasets to create a new and interesting picture

Help people cherry-pick the data they want in the format they prefer

Provide guidance on others’ data efforts

7

binaryedge.io

ORGANIsATION

8

binaryedge.io

BECOMING EXPONENTIAL ORGANIZATION

SIDEAS

CALE

Staff on Demand

MASSIVE TRANSFORMATIVE PURPOSE

MTP

MARKET

Interfaces

Dashboards

Experimentation

Autonomy

Social

Community and Crowd

Algorithms

Lease Assets

Engagement

LEFT BRAIN

ordercontrolstability

RIGHT BRAIN

creativitygrowthuncertainty

9

binaryedge.io

BusinessOperations

Product

Security

DEVdata

science

UI

sysops

data

agent

UX

backend

product

owner

frontendmobile

information

retrievalInternet

of

Things

Internet

of

Money

Internet

of

People

Internet

of

Content

machine

learning

math/stats

design

Tester

QA

quality

support

security

experts

security

experts

audits

devops

marketing

human

resources

finance

sales

sales rep

sales rep

assistant

assistant

social

media

consulting

ORGANISATION RELATIONSHIP

10

binaryedge.io

Goals Requirements Results• Easy to UNDERSTAND • understandability Simple Architecture

• EASY TO extend • extensibility Loosely Coupled Services

• EASY TO change • changeability Built for replacement

• EASY TO replace • replaceability Self-dependency

• EASY TO deploy • deployability Immutability

• EASY TO scale • scalability Responsibility Segretation

• EASY TO recover • resilience Decoupling and Isolation

• EASY TO connect • uniform interface API based

• EASY TO afford • cost efficienT On-demand computing

DATA ARCHITECTURE DESIGN

11

binaryedge.io

PHASEIMPORTANCEEFFORTMILESTONESless

MOREGREATER

average

milestone

intelligencedata information knowledge

July August September October November December 2015

sensor agent(minions)

backend feed API User API data visualizationscalability

machine learning threat

classification

deeplearning

storage &archiving

search &classification

data analyticsimage

processingpredictiveanalytics

POC data process analysis intel

PRODUCT IMPROVEMENTS 2015

12

binaryedge.io

ENGINEERING

13

No legacy to maintain Lots of experience in the team Lots of technologies to pick from Micro service based approach

Metrics collection at large scale

Very young startup

Technologies? Architecture? Prototype?

but where to start?

14

Metrics collection at large scale

15

Architecture

Focus on architecture

simple resilient scalable replaceable components

Technology independent

16

architecture overview

17

HTTP API Command line clients Modules

• Python • NodeJS • Go

Third-party APIs

architecture - job request

API oriented

job types Data Collection Data Processing / Analytics

18

Agents listen for work in channels technologies Multiple types of agents

Agents

architecture - job execution

GO Python NodeJS Scala Java

RabbitMQ NSQ

Redis Apollo

job control

19

ActiveMQ NATS Kafka Kestrel NSQ RabbitMQ Redis QPID HornetQ Apollo

http://bravenewgeek.com/dissecting-message-queues/

architecture - job execution

Messaging

Broker

ActiveMQ

20

http://bravenewgeek.com/dissecting-message-queues/

architecture - job execution

zeromq nanomsg

Messaging

Brokerless

21

architecture - job execution

Amazon Microsoft Google realtime.co …

Messaging

Cloud

22

Agents can feed other agents

Different types of enrichment • Clean data • Process data • Alarms

architecture - data enrichment

23

All information is stored • RAW data • Processed data Geolocate of information Encrypted data for each client Data Storage

Cloud Services • Amazon S3 • Amazon DynamoDB • AzureDocumentDB • Azure Storage • Google Cloud Storage • Google BigQuery • Rackspace Cloud Files • Constant Cloud Storage • Skylable • RunAbove

architecture - store

Database Solutions • MongoDB • ElasticSearch • Cassandra • Riak • LUCEne

24

Delivering data • Realtime - Streaming • Storage for Analytics • API • Raw

Data Analytics • Kibana • InfluxDB • Druid

architecture - serving

25

Data Processing • Apache Spark • Hadoop • Amazon Kinesis

Data Intelligence • Amazon Machine Learning/EMR • Google Prediction API • Azure Machine Learning

architecture - serving

26

Our agents are very simple • Simple tasks • Easy to maintain and adapt

agents/ minions

Agents can be located/run anywhere • Geo distribution • Clouds • Dedicated Servers • Raspberry Pis in Tiago Henriques’ dual gbit connection

27

New Relic Logentries Server Density Cloud watch Grafana logstash

monitoring

28

Ansible Puppet Docker Saltstack etcd

deployment

saltstack.com 29

binaryedge.io

Machine Learning

30

binaryedge.io

CHALLENGES IN DATA MINING

MODELLING LARGE SCALE NETWORKS

DISCOVERY OF THREATS

Network dynamics and Cyberattacks

Privacy Preservation in data mining

31

ip addressurl address

linked urls

internal

external

Company

registration

email

people

phonesocial

search

photos

family&friend

behavior

news

forums

sub-reddits

topics

likes

metadata

BGP

co-hosted sites

shared infrastructure

AS membership

AS Peer

list of IPs

AS

whois

contact

geolocation

phone

social networks

office locations

portscan

services

web

http https certificate configuration authorities entities

web server framework headers cookies

screenshots

dns

domains AXFR MX records

banners

image classifier

threat

SMB

VNC

RDP

files

files apps

SWusers

OCR

data pointscontingency

irrelevant

© 2015 binaryedge.io

malware

32

binaryedge.io

Machine Learning techniques

• Artificial Neural Network (ANN)

• Support vector machine (SVM)

• Decision trees

• bayesian networks (BNS)

• K-Nearest neighbour (knn)

• Hidden Markov Model (HMM)

33

binaryedge.io

Machine Learning - Why?

Classification

Detection

Clustering

Automation

correlation

prediction

analysis

34

binaryedge.io

Measurements on our own data

Support - Indicates which percentage of data on storage shows correlation

Confidence - Indicates probability of our assumption being correct

35

binaryedge.io

Improving our own data

• Kalman Filter

• AdaBoost (Adaptive Boost)

SAMPLE 1 SAMPLE 1.2 SAMPLE 1.3 SAMPLE 1.4

DEEPER DATA POINT SUPPORT

MANUAL CLASSIFICATION

Weight 6/10 Portscan

Weight 3/10 GEolocation

Weight 5/10 OCR screenshot

Weight 9/10 Previous known

Correct data

36

binaryedge.io

DATA chain

CollectionData

Processing ML of Data Report Storage

37

binaryedge.io

CYBER INNOVATION LOOP

Observations

guide & control

cultural

IDENTITY

new information

previous experience

analysis &

synthesisDecisions Action

feedback

feedback

interaction with

environment

interaction with

environment

SECURITY

FEEDS

REAL WORD DATA

RWD

MODELS

guide & control

observe ORIENT DECIDE act

feed forward

feed forward

feed forward

38

binaryedge.io

CYBER INNOVATION LOOP

INFORMATION

HYPOTHESISdirectives

facts classification

resolution

ASSE

SSM

ENT

enac

tmen

t

knowledge

data

• classification knowledge transforms fact to information • assessment knowledge transforms information to hypothesis • resolution knowledge transforms hypothesis to directive • enactment knowledge transforms directive to fact

39

binaryedge.io

CYBERSECURITY DATA SCIENCE

TELEMETRY

SENSOR DATA

CONTEXTUAL DATA

HISTORICAL DATA

REAL TIME PREDICTIONS AND DECISIONS

agents

agents

REAL WORD DATA

RWD

RECOMMENDERCLASSIFIER SOCIAL THREAT FRAUD

features MODELS VALUE

data INTELLIGENCEdata engineering

custom

dashboard

40

binaryedge.io

DEMO

41

binaryedge.io

DEMO

42

binaryedge.io

DEMO

43

binaryedge.io

DEMO

44

contingency irrelevantthreat safe

BE ready. BE safe. BE secure.

BINARYEDGE.IO

Finsterrütistrasse 4, 8134 Adliswil, ZURICH

Switzerland

+ 41 78 632 32 90 Email : th@binaryedge.io

www.binaryedge.io

45

Recommended