Fast Port Scan Using Sequential Hypothesis Testing Jaeyeon Jung, Vern Paxson, Arthur W. Berger, and Hari Balakrishnan

Fast Port Scan Using Fast Port Scan Using Sequential Hypothesis Sequential Hypothesis

TestingTesting

Jaeyeon Jung, Vern Paxson, Arthur W. Jaeyeon Jung, Vern Paxson, Arthur W. Berger, and Hari BalakrishnanBerger, and Hari Balakrishnan

IntroductionIntroduction Port Scanning: ReconnaissancePort Scanning: Reconnaissance

Hackers will scan host/hosts for vulnerable ports as Hackers will scan host/hosts for vulnerable ports as potential avenues of attackpotential avenues of attack

Not clearly definedNot clearly defined Scan sweepsScan sweeps

• Connection to a few addresses, some fail?Connection to a few addresses, some fail? GranularityGranularity

• Separate sources as one scan?Separate sources as one scan? TemporalTemporal

• Over what timeframe should activity be trackedOver what timeframe should activity be tracked IntentIntent

• Hard to differentiate between benign scans and scans with Hard to differentiate between benign scans and scans with malicious intentmalicious intent

Previous Scanning TechniquesPrevious Scanning Techniques

Malformed PacketsMalformed Packets Packets used for “stealth scanning”Packets used for “stealth scanning”

Connections to ports/hosts per unit timeConnections to ports/hosts per unit time Checks whether a source hits more than X Checks whether a source hits more than X

ports on Y hosts in Z timeports on Y hosts in Z time Failed connectionsFailed connections

Malicious connections will have a higher ratio Malicious connections will have a higher ratio of failed connection attemptsof failed connection attempts

Bro NIDSBro NIDS

Current algorithm in use for yearsCurrent algorithm in use for years High efficiencyHigh efficiency Counts local connections from remote hostCounts local connections from remote host Differentiates connections by serviceDifferentiates connections by service Sets thresholdSets threshold Blocks suspected malicious hostsBlocks suspected malicious hosts

Flaws in BroFlaws in Bro

Skewed for little-used serversSkewed for little-used servers Example: a private host that one worker Example: a private host that one worker

remotely logs into from homeremotely logs into from home Difficult to choose probabilitiesDifficult to choose probabilities Difficult to determine never-accessed Difficult to determine never-accessed

hostshosts Needs data to determine appropriate Needs data to determine appropriate

parametersparameters

Threshold Random Walk (TRW)Threshold Random Walk (TRW)

Objectives for the new algorithm:Objectives for the new algorithm: Require performance near BroRequire performance near Bro High speedHigh speed Flag as scanner if no useful connectionFlag as scanner if no useful connection Detect single remote hostsDetect single remote hosts

Data AnalysisData Analysis

Data analyzed from two sites, LBL and ICSIData analyzed from two sites, LBL and ICSI Research laboratories with minimal firewallingResearch laboratories with minimal firewalling LBL: 6000 hosts, sparse host densityLBL: 6000 hosts, sparse host density ICSI: 200 hosts, dense host densityICSI: 200 hosts, dense host density

Separating Possible ScannersSeparating Possible Scanners

Which of remainder are likely, but Which of remainder are likely, but undetected scanners?undetected scanners? Argument nearly circularArgument nearly circular Show that there are properties plausibly used Show that there are properties plausibly used

to distinguish likely scanners in the remainderto distinguish likely scanners in the remainder Use that as a ground truth to develop an Use that as a ground truth to develop an

algorithm againstalgorithm against

Data Analysis (cont.)Data Analysis (cont.)

First modelFirst model Look at remainder hosts making failed Look at remainder hosts making failed

connectionsconnections Compare all of remainder to known badCompare all of remainder to known bad Hope for two modes, where the failed Hope for two modes, where the failed

connection mode resembles the known badconnection mode resembles the known bad No such modality existsNo such modality exists

Data Analysis (cont.)Data Analysis (cont.)

Second modelSecond model Examine ratio of hosts with failed connections Examine ratio of hosts with failed connections

made to successful connections mademade to successful connections made Known bad have a high percentage of failed Known bad have a high percentage of failed

connectionsconnections Conclusion: remainder hosts with <80% Conclusion: remainder hosts with <80%

failure are potentially benignfailure are potentially benign Rest are suspectRest are suspect

TRW – continuedTRW – continued

Detect failed/succeeded connectionsDetect failed/succeeded connections Sequential Hypothesis TestingSequential Hypothesis Testing

Two hypotheses: benign (H_0) and scanner (H_1)Two hypotheses: benign (H_0) and scanner (H_1) Probabilities determined by the equationsProbabilities determined by the equations Theta_0 > theta_1 (benign has higher chance of Theta_0 > theta_1 (benign has higher chance of

succeeding connection)succeeding connection) Four outcomes: detection, false positive, false Four outcomes: detection, false positive, false

negative, nominalnegative, nominal

ThresholdsThresholds

Choose ThresholdsChoose Thresholds Set upper and lower thresholds, n_0 and n_1Set upper and lower thresholds, n_0 and n_1 Calculate likelihood ratioCalculate likelihood ratio Compare to thresholdsCompare to thresholds

Choosing ThresholdsChoosing Thresholds Choose two constants, alpha and betaChoose two constants, alpha and beta

Probability of false positive (P_f) <= alphaProbability of false positive (P_f) <= alpha Detection probability (P_d) >= betaDetection probability (P_d) >= beta Typical values: alpha = 0.01, beta = 0.99Typical values: alpha = 0.01, beta = 0.99

Thresholds can be defined in terms of P_f and Thresholds can be defined in terms of P_f and P_d or alpha and betaP_d or alpha and beta n_1 <= P_d/P_fn_1 <= P_d/P_f n_0 >= (1-P_d)/(1-P_f)n_0 >= (1-P_d)/(1-P_f) Can be approximated using alpha and betaCan be approximated using alpha and beta n_1 = beta/alphan_1 = beta/alpha n_0 = (1-beta)/(1-alpha)n_0 = (1-beta)/(1-alpha)

Evaluation MethodologyEvaluation Methodology

Used the data from the two labsUsed the data from the two labs Knowledge of whether each connection is Knowledge of whether each connection is

established, rejected, or unansweredestablished, rejected, or unanswered Maintains 3 variables for each remote hostMaintains 3 variables for each remote host

D_s, the set of distinct hosts previously D_s, the set of distinct hosts previously connected toconnected to

S_s, the decision state (pending, H_0, or H_1)S_s, the decision state (pending, H_0, or H_1) L_s, the likelihood ratioL_s, the likelihood ratio

Evaluation Methodology (cont.)Evaluation Methodology (cont.)

For each line in datasetFor each line in dataset Skip if not pendingSkip if not pending Determine if connection is successfulDetermine if connection is successful Check whether is already in connection set; if Check whether is already in connection set; if

so, proceed to next lineso, proceed to next line Update D_s and L_sUpdate D_s and L_s If L_s goes beyond either threshold, update If L_s goes beyond either threshold, update

state accordinglystate accordingly

ResultsResults

TRW EvaluationTRW Evaluation Efficiency – true positives to rate of H1Efficiency – true positives to rate of H1 Effectiveness – true positives to all scannersEffectiveness – true positives to all scanners N – Average number of hosts probed before detectionN – Average number of hosts probed before detection

TRW Evaluation (cont.)TRW Evaluation (cont.)

TRW is far more effective than the other TRW is far more effective than the other twotwo

TRW is almost as efficient as BroTRW is almost as efficient as Bro TRW detects scanners in far less timeTRW detects scanners in far less time

Potential ImprovementsPotential Improvements

Leverage Additional InformationLeverage Additional Information Factor for specific services (e.g. HTTP)Factor for specific services (e.g. HTTP) Distinguish between unanswered and rejected Distinguish between unanswered and rejected

connectionsconnections Consider time local host has been inactiveConsider time local host has been inactive Consider rateConsider rate Introduce correlations (e.g. 2 failed in a row Introduce correlations (e.g. 2 failed in a row

worse than 1 fail, 1 success, 1 fail)worse than 1 fail, 1 success, 1 fail) Devise a model on history of the hostsDevise a model on history of the hosts

Improvements (cont.)Improvements (cont.) Managing StateManaging State

Requires large amount of maintained states for trackingRequires large amount of maintained states for tracking However, capping the state is vulnerable to state overflow attacksHowever, capping the state is vulnerable to state overflow attacks

How to RespondHow to Respond What to do when a scanner is detected?What to do when a scanner is detected? Is it worth blocking?Is it worth blocking?

Evasion and GamingEvasion and Gaming Spoofed IPsSpoofed IPs

• Institute “whitelists”Institute “whitelists”• Use a honeypot to try to connectUse a honeypot to try to connect

Evasion (inserting legitimate connections in scan)Evasion (inserting legitimate connections in scan)• Incorporating other information, such as a model of what is normal for Incorporating other information, such as a model of what is normal for

legitimate users and give less weight to connections not fitting the patternlegitimate users and give less weight to connections not fitting the pattern Distributed ScansDistributed Scans

Scans originating from more than one sourceScans originating from more than one source Difficult to fix in this frameworkDifficult to fix in this framework

Conclusion/SummaryConclusion/Summary

TRW- based on ratio of failed/succeeded TRW- based on ratio of failed/succeeded connectionsconnections

Sequential Hypothesis TestingSequential Hypothesis Testing Highly accurateHighly accurate Quick ResponseQuick Response

Documents

Fast Port Scan Using Sequential Hypothesis Testing Jaeyeon Jung, Vern Paxson, Arthur W. Berger, and Hari Balakrishnan