22
Five-sigma Network Events (and how to find them) John O'Neil Edgewise Networks Halloween, 2018

Five-sigma Network Events (and how to find them)

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Five-sigma Network Events (and how to find them)

Five-sigmaNetworkEvents(andhowtofindthem)

JohnO'NeilEdgewiseNetworksHalloween,2018

Page 2: Five-sigma Network Events (and how to find them)

Networks are Complex

�2

• Nooneknowswhat'sgoingon

Page 3: Five-sigma Network Events (and how to find them)

Finding the Strange & Unusual

�3

• Orthenew&unexpected🎃

• …andifit’sdifferent,itmightbebad.☠

• Outlier—Improbabledatapointintheexpecteddistribution🦔

• Anomaly—Datapointgeneratedbyadifferentdistribution👽

Page 4: Five-sigma Network Events (and how to find them)

Mr. Splanky

�4

Page 5: Five-sigma Network Events (and how to find them)

Anomaly & Outlier Tools

�5

“Ifyouwantsomethingdoneright,doityourself.”—Charles-GuillaumeÉtienne

Page 6: Five-sigma Network Events (and how to find them)

Using Python

�6

• Interpretablepseudocode

• Maturelibraries.

• Easytoinstall

• Fastenough😀

Page 7: Five-sigma Network Events (and how to find them)

Creating Tools for Outlier Detection

�7

• IntroducingafewtoolswritteninPython

• Intendedtoanswerinterestingquestionsandscalewell

• Easytomodify/improvetosatisfyyourcuriosity

• Astartingpointforyourowntools

• Codeisavailableat:

http://github.com/EdgewiseNetworks/five-sigma

Page 8: Five-sigma Network Events (and how to find them)

Discover Bad Things Before Big Problems

�8

• Keeptrackofnetflowsacrossmachinesandacrosstime

• Wellenoughtorecognizeunusualthings

• Buttoomuchinformation

• Andmakeittunable

Page 9: Five-sigma Network Events (and how to find them)

Standard Deviation

�9

5σ ≈ 10−6

3.29σ ≈ 10−3

Theamountof“spread”ina(usuallyGaussian)distribution.

Page 10: Five-sigma Network Events (and how to find them)

Project Overview

�10

1. Createafeedoftypicalnetflows• Basedonrealnetflowsbutanonymized• Format:timestamp,src_ip,src_port,dest_ip,dest_port,flow_count

2. Createaconsumerforthesenetflows.

3. Createanumberofconsumertoolstotrackinterestingstatistics.• Standarddeviations• Updateperiod

Page 11: Five-sigma Network Events (and how to find them)

Examples Of Useful Information

�11

1. DoesanIPaddresskeepscanningfornewopenports?

2. DidanIPaddresssuddenlygetalotbusierthanit’severbeeninthepast?

3. DidanIPaddresssuddenlygetalotbusierthananyotherIPaddress?

4. Shouldn’tthisIPaddresshavestoppeddoingnewthingsbynow?

Page 12: Five-sigma Network Events (and how to find them)

Tools To Use: Sketching & Streaming

�12

• Lotsofdatatokeeptrackof

• Butwe'reonlyinterestedincertainaspectsofit- Setcardinality—HyperLogLog- Incrementalmeans&standarddeviations- Onlinelinearregression

• Makebigdataintosmalldata

Page 13: Five-sigma Network Events (and how to find them)

Other Examples of Approximate Probabilistic Sketches

�13

• BloomFilter(setmembership)

• Count-MinSketch(countingitems)

• MinHash(setintersection)

• Locality-SensitiveHashing(LSH:nearestneighbors)

• Q-digest/T-digest(quantiledistribution—MOREABOUTTHISLATER)

Page 14: Five-sigma Network Events (and how to find them)

IpPortScanDetector

�14

Q:DoesanIPaddresskeepscanningfornewopenports?

Contains:{IP_address:HyperLogLog}mapEachHLLcountsdistinctIP:portdestinations.

At each period:mean, sigma = Stdev(hll.cardinality() for every HLL)For each IP_address & HLL :if HLL.cardinality() > N sigmas above the mean:report it.

Page 15: Five-sigma Network Events (and how to find them)

GrowthDetector

�15

Q:DidanIPaddresssuddenlygetalotbusierthanit’severbeeninthepast?

Contains:{IP_address:HyperLogLog}—periodCardinalityMapEachHLLcountsdistinctIP:portdestinationsoveralltime.

{IP_address:StdDev}—periodStatisticsMapEachStdDevincrementallycalculatesmeansandstdevs.

At each period:For each IP_address & HLL & StdDev:currCount = HLL.cardinality()mean, sigma = StdDev.getMeanAndStdev()if currCount > N sigmas above its mean:report it.

HLL.clear()StdDev.add(currCount, current_period)

Page 16: Five-sigma Network Events (and how to find them)

ExplosionDetector

�16

Q:DidanIPaddresssuddenlygetalotbusierthananyotherIPaddress?

Contains:{IP_address:HyperLogLog}—periodCardinalityMapEachHLLcountsdistinctIP:portdestinationsincurrentperiod.

At each period:mean, sigma = Stdev(hll.cardinality() for every HLL)For each IP_address, hll:curr = hll.cardinality()if curr > N sigmas above the mean:report it.

hll.clear()

Page 17: Five-sigma Network Events (and how to find them)

Host Stabilization

�17

• Assumean“exponentialdecay”ofnewIP:portcontactsovertime

• Weknowhowmanywe’veseen,butnothowmanyareleft.

• CanweestimateN_remgivenN_obs?Somecalculuslater…whyyes,wecan.

Nrem ≈ − slope(xi) × avg(xi)

y ∼ e−ax

Page 18: Five-sigma Network Events (and how to find them)

HostStabilizationDetector

�18

Q:Shouldn’tthisIPaddresshavestoppeddoingnewthingsbynow?

Contains:{IP_address:HyperLogLog}—cardinalityMapEachHLLcountsdistinctIP:portdestinationsoveralltime.

{IP_address:StdDev}—periodAverageMapEachStdDevincrementallycalculatesmeansandstdevs.

{IP_address:IncrLinReg}—IncrementalLinearRegressionMapEachStdDevincrementallycalculatesmeansandstdevs.

At each period:For each IP_address, HLL, StdDev, IncrLinReg:N_obs = HLL.cardinality()slope, intercept = IncrLinReg.estimate()avg = StdDev.getMean()N_rem = -slope * avgreportIfDisagree(N_rem < tol, IP_address.frozen)IP_address.setFrozen(N_rem < tol){IncrLinReg, StdDev}.update(N_obs, current_period)

Page 19: Five-sigma Network Events (and how to find them)

Demo Time!

�19

Page 20: Five-sigma Network Events (and how to find them)

But is it Gaussian?

�20

• “Long-tail”or“fat-tail”distributions?

• Trypowerlaworlog-linearfitting• Andmanyothers?• Butthiscangetcomplicated….

• ReplaceStdDevwithtdigest.TDigest

Page 21: Five-sigma Network Events (and how to find them)

Conclusions

�21

• Withouttheagonizingpain

• PythondatasciencetoolsFTW

• Coolsketching&streamingdatastructures

• “Alittlelearningisadangerousthing”…andalittlestatisticsisevenbetter!

• Onlythebeginning—lotsofroomforimprovement

Page 22: Five-sigma Network Events (and how to find them)

TheEndThanksforattending!

Suggestedquestions

1. HowdoIinstallPython,again?2. WhatcanIdowithflow_countsinmynetflows?3. ShowmethecalculusforestimatingNrem!4. So,whatistherealstatisticaldistributionofthatdata?5. HowdoesHyperLogLogwork?

http://github.com/EdgewiseNetworks/five-sigma