46
KENEXIS Kenexis Correlating Risk Events and Process Trends to Improve Reliability Bryan L Singer, CISM, CISSP, CAP Principal Investigator KENEXIS Security Corporation, Co-Chair ISA-99 and ISA-99 WG7 Copyright © 2010 Kenexis Security Corporation

Bryan Singer S4 Presentation

Embed Size (px)

DESCRIPTION

Presentation I gave at S4 this year on correlating ICS Network Performance to Process Trends

Citation preview

Page 1: Bryan Singer   S4 Presentation

KENEXIS

Kenexis

Correlating Risk Events and Process Trends to Improve Reliability

Bryan L Singer, CISM, CISSP, CAP Principal InvestigatorKENEXIS Security Corporation, Co-Chair ISA-99 and ISA-99 WG7

Copyright © 2010 Kenexis Security Corporation

Page 2: Bryan Singer   S4 Presentation

KENEXIS

• A generic PLC/HMI/controller architecture and TCP/IP• This session focuses on ICS network behaviors and

vulnerabilities.... Irrespective of the "cause"• OEE is used as a KPI example- common in discrete and

hybrid, there are similar KPI's in process• Emphasis is on process improvement and intelligence

more than security

Assumptions

Copyright © 2010 Kenexis Security Corporation

Page 3: Bryan Singer   S4 Presentation

RespondPlan Prepare Defend

LIMITATIONS OF ROSI

Copyright © 2010 Kenexis Security Corporation

Page 4: Bryan Singer   S4 Presentation

KENEXIS

Event SLE ARO ALE Extended ALE RF Residual Risk

Minor $50,000 10 $500,000 $50,000,000 65% $17,500,000

Moderate $250,000 3 $750,000 $75,000,000 55% $33,750,000

Significant $1,000,000 0.2 $200,000 $20,000,000 25% $15,000,000

Catastrophic $20,000,000 0.01 $200,000 $20,000,000 5% $19,000,000

$1,650,000 $165,000,000 ($79,750,000) $85,250,000

Challenges for ROSI

Given: 100 plants

Common Example of a High Level Risk Study

Copyright © 2010 Kenexis Security Corporation

Page 5: Bryan Singer   S4 Presentation

KENEXIS

• Risk Analysis Alone: 52.76%?

• Rationalized Study: 18-22%• Insufficient for

Controllership Criteria (typically ~33%)

Typical Cost Models

Cost Item Cost Per Site

Firewall $75,000Routers and Switches $125,000

Cabling and Infrastructure

$25,000

Labor $72,000Cost Per Site $297,000

Extended Cost $29,700,000

Copyright © 2010 Kenexis Security Corporation

Page 6: Bryan Singer   S4 Presentation

KENEXIS

• Lack of ICS Vulnerability Data• Insufficient Incident Data - public

and private• Improper/insufficient definition of

"security"• Psychological Limitations• Business Objective Conflicts• Limited Budgets • Controllership Requirements for

Capital Expenditure

What Complicates ROSI?

Copyright © 2010 Kenexis Security Corporation

Page 7: Bryan Singer   S4 Presentation

KENEXIS

• We KNOW This is an Issue• What Other Tools are Available?

What to Do?

Copyright © 2010 Kenexis Security Corporation

Page 8: Bryan Singer   S4 Presentation

KENEXIS

Protecting the Process is ONGOING

Source: ANSI/ISA99.00.01

Copyright © 2010 Kenexis Security Corporation

Page 9: Bryan Singer   S4 Presentation

KENEXIS

ICS Security is Multi-Disciplined

Source: ANSI/ISA99.00.01

Copyright © 2010 Kenexis Security Corporation

Page 10: Bryan Singer   S4 Presentation

KENEXIS

From Digital Bond Blog, "Why Security Talent Capitalization Rate is Low"

Use this to our advantage!!!!http://www.digitalbond.com/index.php/2009/12/21/why-security-talent-capitalization-rate-is-low/

Why Security Talent Capitalization is Low

[...] Security talent is not valued – Many of the skills that would make one talented in cyber security also can be applied to other

control system endeavors. [...] 

Copyright © 2010 Kenexis Security Corporation

Page 11: Bryan Singer   S4 Presentation

KENEXIS

• Every plant analyzes process performance

• Few plants monitor ICS network health

• The data is there, for those that care to look!

Process Reliability - A Deeper Inspection

Copyright © 2010 Kenexis Security Corporation

Page 12: Bryan Singer   S4 Presentation

RespondPlan Prepare Defend

ICS VERSUS IT NETWORKING

Copyright © 2010 Kenexis Security Corporation

Page 13: Bryan Singer   S4 Presentation

KENEXIS

IT Network Topology

Frequently follows the CDA model (Core, Distribution, Access) in Campus networks

Copyright © 2010 Kenexis Security Corporation

Page 14: Bryan Singer   S4 Presentation

KENEXIS

ICS Network Topologies

• Almost never follow the CDA model

• Sometimes have multipath routing and switching

• Mix of OSI Layer 2, Layer 3• Devices from 6 months to

10+ years old• Mix of 10mbps/Half,

100mbps/Full, and Gbit

Copyright © 2010 Kenexis Security Corporation

Page 15: Bryan Singer   S4 Presentation

KENEXIS

Data Flow in an IT Network

• Frequently characterized by short TCP times, small amounts of data

• Frequently off-site, client-server, and limited paths for a complete communication

• Rarely time critical below a few seconds

Copyright © 2010 Kenexis Security Corporation

Page 16: Bryan Singer   S4 Presentation

KENEXIS

Data Flow in an ICS Network

• Characterized by long TCP connection times

• Momentary blips or interruptions can fail-state the process

• Mostly “internal” to process network

• Limited external connections

Copyright © 2010 Kenexis Security Corporation

Page 17: Bryan Singer   S4 Presentation

KENEXIS

• Focuses Primarily on layers 1 and 2– SNMP– Switch Statistics– Cable Analyzers– Utilization

Network Performance Monitoring

Copyright © 2010 Kenexis Security Corporation

Page 18: Bryan Singer   S4 Presentation

KENEXIS

It's Not an Issue of Bandwidth

Latency is the #1 Observed Killer of ICS Network Reliability

Copyright © 2010 Kenexis Security Corporation

Page 19: Bryan Singer   S4 Presentation

KENEXIS

A Single Process Instruction...

• As Many as 19 Independent TCP Sessions

• Doesn't Include Process Intelligence Apps (MES, LIMS, etc)

• Any latency can result in failure

Copyright © 2010 Kenexis Security Corporation

Page 20: Bryan Singer   S4 Presentation

RespondPlan Prepare Defend

VULNERABILITIES FOR ICS

Copyright © 2010 Kenexis Security Corporation

Page 21: Bryan Singer   S4 Presentation

KENEXIS

Most Vulnerabilities Have Network Components

A total of 31 common devices were tested: 17 DCS controllers and 14 SIS controllers. Of the 505 vulnerabilities detected, 298 reside in DCS controllers and 207 in SIS controllers. These vulnerabilities were found in the link, network, and transport layers (layers 2-4) of the network stack

SOURCE: Wurldtech Security Technologies Delphi Study

Good Network Design and Performance Monitoring can GREATLY Improve Security.

Conversely, failures here can be system wide, and result

in potentially hazardous conditions.

Copyright © 2010 Kenexis Security Corporation

Page 22: Bryan Singer   S4 Presentation

KENEXIS

Protocols and Potential Impact

SOURCE: Wurldtech Security Technologies Delphi Study

• Why is this?• Network Stacks often

limited• Limited processing

power• Time critical

functionality

Copyright © 2010 Kenexis Security Corporation

Page 23: Bryan Singer   S4 Presentation

KENEXIS

Most Vulnerabilities are Rate Dependent

SOURCE: Wurldtech Security Technologies Delphi Study

• Example:• 1000 Packets -

OK!• 10,000 Packets

– FAIL!

Copyright © 2010 Kenexis Security Corporation

Page 24: Bryan Singer   S4 Presentation

KENEXIS

• Good Design can Mitigate Many Security Threats of Concern to ICS

• Good Design should be driven by:– Principle of Least Route– Usage of Managed Infrastructure– Isolation of High Risk Domains– Periodic Analysis of ICS Networks

Conclusions for ICS Networks

Copyright © 2010 Kenexis Security Corporation

Page 25: Bryan Singer   S4 Presentation

RespondPlan Prepare Defend

IMPACTS OF ICS NETWORK FAULTS

Copyright © 2010 Kenexis Security Corporation

Page 26: Bryan Singer   S4 Presentation

KENEXIS

• Day to Day– Application Delays– Update Errors– Poor Performance– Nuisance Trips– Failsafe Protocols– Unexplained Shutdowns

• Catastrophic– Extended Shutdowns– Dangerous Failures

Rarely is the network blamed or even assessed

Manifestations of ICS Network Failures

Copyright © 2010 Kenexis Security Corporation

Page 27: Bryan Singer   S4 Presentation

KENEXIS

• These Can Be Used to Model ICS Failure Scenarios– Loss of View (LoV)– Manipulation of View (MoV)– Denial of Control (DoC)– Loss of Control (LoC)

ICS Failure Modes

Copyright © 2010 Kenexis Security Corporation

Page 28: Bryan Singer   S4 Presentation

KENEXIS

• Failure in Isomerization Unit• Vapor cloud explosion• Faulty sensor and design• Failure in physical valves• Improper operator actions based

on view• Built up over 12-18 hours• $87M fine from OSHA

2005 Texas City Example

[…] Process safety did not have the same priority, at least, as commercial issues for

John, and there were important performance gaps from a management

accountability perspective concerning his actions (or inactions)," the report said.

Page 29: Bryan Singer   S4 Presentation

KENEXIS

• 1994 Texaco Refinery– 18 hours buildup– LoV, DoC, LoC progressed– 30 critical alarms per minute generated at time of

explosion• Bellingham, WA Pipeline

– LoV, DoC, LoC• NE Power Outage

– LoV, DoC

Additional Examples

Copyright © 2010 Kenexis Security Corporation

Page 30: Bryan Singer   S4 Presentation

KENEXIS

• Vitek Boden....

... Just Kidding

And our Favorite Example?

Copyright © 2010 Kenexis Security Corporation

Page 31: Bryan Singer   S4 Presentation

KENEXIS

• TCP errors (fast retransmissions, retransmissions, and duplicate ACKs)

• TCP receive window size shrinkage• Excessive round trip time (different from TTL)• TTL (Time to Live)• Traffic rates by protocol

Observable ICS Network Conditions

Copyright © 2010 Kenexis Security Corporation

Page 32: Bryan Singer   S4 Presentation

RespondPlan Prepare Defend

UNITING PROCESS PERFORMANCE AND ICS NETWORKING

Copyright © 2010 Kenexis Security Corporation

Page 33: Bryan Singer   S4 Presentation

KENEXIS

Bringing It All Together

Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize

SLFR_LVL2_01A Fa0/13 2957 2182 0 5139 0

Fa0/14 0 0 0 0 0

Fa0/15 2146 1455 0 3601 0

Fa0/16 2796 2027 0 4823 0

Fa0/17 18696 24375 0 43071 0

Fa0/18 986 759 0 1745 0

Fa0/19 833 672 0 1505 0

Fa0/20 1511 1250 0 2761 0

Fa0/21 3374 3196 0 6570 0

Fa0/22 0 0 0 0 0

Fa0/23 2774 3780 0 6554 0

Fa0/24 218 302 0 520 0

THIS is how you get ROSI!

Copyright © 2010 Kenexis Security Corporation

Page 34: Bryan Singer   S4 Presentation

KENEXIS

• Risk May be Systematically Reduced Through Two Actions:– Reduction in Certain events– Reduction in Uncertainty

• For Process Control– Associate ICS network to process performance– Present information to operators– Allows early warning of impending failure– Provides mechanisms to measure network and

security improvement

Reconsidering Risk Approaches

Copyright © 2010 Kenexis Security Corporation

Page 35: Bryan Singer   S4 Presentation

KENEXIS

• Measure OSI Layer 1 and 2• Inspect Layer 3 and Higher in Context of ICS

Communications• Associate Network Conditions to Observable

Process Conditions

Effective ICS Network and Security Services

Copyright © 2010 Kenexis Security Corporation

Page 36: Bryan Singer   S4 Presentation

KENEXIS

• Key Performance Indicators (KPI)– Measured Value KPI– Calculated KPI

• Process Intelligence Applications (MES, historians, dashboards, etc) display measures, values, and trends

Associating Process Trends

Copyright © 2010 Kenexis Security Corporation

Page 37: Bryan Singer   S4 Presentation

KENEXIS

– OEE ... Unpacked: Overall Equipment Effectiveness• Measure of Performance Against Theoretical Ideal• Comprised of Three Calculated KPI

– Availability– Performance– Quality

Presentation KPI Example: OEE

Measure ValueAvailability 90%Performance 95%Quality 99.9%

The resulting OEE is (90/100)* (95/100) * (99.9) / 100 or (90 * 95 * 99.9) / 10000, which is 85.4%

Copyright © 2010 Kenexis Security Corporation

Page 38: Bryan Singer   S4 Presentation

KENEXIS

Six Big Losses OEE Factor OEE ImpactBreakdowns Availability Availability

• Down Time (cumulative and event) • Real-time production mode indication • Reason Code tracking and analysis • Statistics and metrics are real-time automated • Operators can focus on getting equipment running

Setup and Adjustments

Availability • Setup Time (cumulative and event) • Set goals for Setup Time reduction programs

Small Stops Performance Performance • Average Cycle Time • Small Stops (occurrences and time) • Configurable Small Stop Threshold • Cycle Time Trace records every cycle • Identify when and how time is lost to Small Stops

Reduced Speed Performance • Reduced Speed (occurrences and time) • Configurable Reduced Speed Threshold • Cycle Time Trace records every cycle • Identify Reduced Speed patterns

Startup Rejects Quality • Reject Pieces (during Startup) • Percent Reject Pieces (during Startup)

Production Rejects

Quality Quality • Reject Pieces (during Production) • Percent Reject Pieces (during Production)

Relating ICS Network to OEE

Copyright © 2010 Kenexis Security Corporation

Page 39: Bryan Singer   S4 Presentation

KENEXIS

Measuring Specific ICS Events

• SPAN Ports• Active Taps• Sniffers• Capture Filters• SNORT rules

Copyright © 2010 Kenexis Security Corporation

Page 40: Bryan Singer   S4 Presentation

KENEXIS

• Communications a Variable

• Many Causes Can Create Momentary Network Glitches that are NOT Harmful

Importance of Analyzing Over Time

Copyright © 2010 Kenexis Security Corporation

Page 41: Bryan Singer   S4 Presentation

KENEXIS

• SPC for ICS Networks• DSP could be used for

further reduction in false positives

Reducing False Positives

High-Low

High-High

Copyright © 2010 Kenexis Security Corporation

Page 42: Bryan Singer   S4 Presentation

KENEXIS

• Time Synchronization CRITICAL

• Provide Information on Operator Screens

• Associate Network Conditions to Process Conditions

Associate ICS Events to Process Performance

Copyright © 2010 Kenexis Security Corporation

Page 43: Bryan Singer   S4 Presentation

KENEXIS

• PCAPs of the event are recorded at time of event

• No need to "wait" for condition to reappear

• Advanced remote support and diagnostics are possible without touching a running process

• Recording in a separate system on a out of band network channel enables forensics of the actual event

What To Do With The Data?

Copyright © 2010 Kenexis Security Corporation

Page 44: Bryan Singer   S4 Presentation

KENEXIS

Availability Quality Performance

OEE

Pre 96% 97.5% 92.33% 86.42%Plan 97.1% 97.8% 94.3% 89.55%Actual 97.08% 97.72% 94.01% 89.18%

+2.76% $99,474.77

ICS Network to OEE Case Study

Given: 1 Plant with 4 Lines, and a $12,000 per +/- OEE point, per line

Copyright © 2010 Kenexis Security Corporation

Page 45: Bryan Singer   S4 Presentation

KENEXIS

• Ability to capture all the required data– High quality network cards– Fast processor and memory– Use tcpdump or similar– Capture with active taps or SPAN

• Capture meaningful data– Various tap points for collection - not

just core switches and paths– Network captures MUST be time

synchronized to datastores (i.e. historian)

Critical to Success

Page 46: Bryan Singer   S4 Presentation

KENEXIS

• Today, justifying security remains a challenge

• There is a significant gap in understanding between process performance and network performance

• Security and network failures manifest in measurable ways

• Associating network failures to process conditions is an effective and viable activity

• Key is to determine causal factors and clear KPIs

Summary

Copyright © 2010 Kenexis Security Corporation