Upload
bsinger74
View
631
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Presentation I gave at S4 this year on correlating ICS Network Performance to Process Trends
Citation preview
KENEXIS
Kenexis
Correlating Risk Events and Process Trends to Improve Reliability
Bryan L Singer, CISM, CISSP, CAP Principal InvestigatorKENEXIS Security Corporation, Co-Chair ISA-99 and ISA-99 WG7
Copyright © 2010 Kenexis Security Corporation
KENEXIS
• A generic PLC/HMI/controller architecture and TCP/IP• This session focuses on ICS network behaviors and
vulnerabilities.... Irrespective of the "cause"• OEE is used as a KPI example- common in discrete and
hybrid, there are similar KPI's in process• Emphasis is on process improvement and intelligence
more than security
Assumptions
Copyright © 2010 Kenexis Security Corporation
RespondPlan Prepare Defend
LIMITATIONS OF ROSI
Copyright © 2010 Kenexis Security Corporation
KENEXIS
Event SLE ARO ALE Extended ALE RF Residual Risk
Minor $50,000 10 $500,000 $50,000,000 65% $17,500,000
Moderate $250,000 3 $750,000 $75,000,000 55% $33,750,000
Significant $1,000,000 0.2 $200,000 $20,000,000 25% $15,000,000
Catastrophic $20,000,000 0.01 $200,000 $20,000,000 5% $19,000,000
$1,650,000 $165,000,000 ($79,750,000) $85,250,000
Challenges for ROSI
Given: 100 plants
Common Example of a High Level Risk Study
Copyright © 2010 Kenexis Security Corporation
KENEXIS
• Risk Analysis Alone: 52.76%?
• Rationalized Study: 18-22%• Insufficient for
Controllership Criteria (typically ~33%)
Typical Cost Models
Cost Item Cost Per Site
Firewall $75,000Routers and Switches $125,000
Cabling and Infrastructure
$25,000
Labor $72,000Cost Per Site $297,000
Extended Cost $29,700,000
Copyright © 2010 Kenexis Security Corporation
KENEXIS
• Lack of ICS Vulnerability Data• Insufficient Incident Data - public
and private• Improper/insufficient definition of
"security"• Psychological Limitations• Business Objective Conflicts• Limited Budgets • Controllership Requirements for
Capital Expenditure
What Complicates ROSI?
Copyright © 2010 Kenexis Security Corporation
KENEXIS
• We KNOW This is an Issue• What Other Tools are Available?
What to Do?
Copyright © 2010 Kenexis Security Corporation
KENEXIS
Protecting the Process is ONGOING
Source: ANSI/ISA99.00.01
Copyright © 2010 Kenexis Security Corporation
KENEXIS
ICS Security is Multi-Disciplined
Source: ANSI/ISA99.00.01
Copyright © 2010 Kenexis Security Corporation
KENEXIS
From Digital Bond Blog, "Why Security Talent Capitalization Rate is Low"
Use this to our advantage!!!!http://www.digitalbond.com/index.php/2009/12/21/why-security-talent-capitalization-rate-is-low/
Why Security Talent Capitalization is Low
[...] Security talent is not valued – Many of the skills that would make one talented in cyber security also can be applied to other
control system endeavors. [...]
Copyright © 2010 Kenexis Security Corporation
KENEXIS
• Every plant analyzes process performance
• Few plants monitor ICS network health
• The data is there, for those that care to look!
Process Reliability - A Deeper Inspection
Copyright © 2010 Kenexis Security Corporation
RespondPlan Prepare Defend
ICS VERSUS IT NETWORKING
Copyright © 2010 Kenexis Security Corporation
KENEXIS
IT Network Topology
Frequently follows the CDA model (Core, Distribution, Access) in Campus networks
Copyright © 2010 Kenexis Security Corporation
KENEXIS
ICS Network Topologies
• Almost never follow the CDA model
• Sometimes have multipath routing and switching
• Mix of OSI Layer 2, Layer 3• Devices from 6 months to
10+ years old• Mix of 10mbps/Half,
100mbps/Full, and Gbit
Copyright © 2010 Kenexis Security Corporation
KENEXIS
Data Flow in an IT Network
• Frequently characterized by short TCP times, small amounts of data
• Frequently off-site, client-server, and limited paths for a complete communication
• Rarely time critical below a few seconds
Copyright © 2010 Kenexis Security Corporation
KENEXIS
Data Flow in an ICS Network
• Characterized by long TCP connection times
• Momentary blips or interruptions can fail-state the process
• Mostly “internal” to process network
• Limited external connections
Copyright © 2010 Kenexis Security Corporation
KENEXIS
• Focuses Primarily on layers 1 and 2– SNMP– Switch Statistics– Cable Analyzers– Utilization
Network Performance Monitoring
Copyright © 2010 Kenexis Security Corporation
KENEXIS
It's Not an Issue of Bandwidth
Latency is the #1 Observed Killer of ICS Network Reliability
Copyright © 2010 Kenexis Security Corporation
KENEXIS
A Single Process Instruction...
• As Many as 19 Independent TCP Sessions
• Doesn't Include Process Intelligence Apps (MES, LIMS, etc)
• Any latency can result in failure
Copyright © 2010 Kenexis Security Corporation
RespondPlan Prepare Defend
VULNERABILITIES FOR ICS
Copyright © 2010 Kenexis Security Corporation
KENEXIS
Most Vulnerabilities Have Network Components
A total of 31 common devices were tested: 17 DCS controllers and 14 SIS controllers. Of the 505 vulnerabilities detected, 298 reside in DCS controllers and 207 in SIS controllers. These vulnerabilities were found in the link, network, and transport layers (layers 2-4) of the network stack
SOURCE: Wurldtech Security Technologies Delphi Study
Good Network Design and Performance Monitoring can GREATLY Improve Security.
Conversely, failures here can be system wide, and result
in potentially hazardous conditions.
Copyright © 2010 Kenexis Security Corporation
KENEXIS
Protocols and Potential Impact
SOURCE: Wurldtech Security Technologies Delphi Study
• Why is this?• Network Stacks often
limited• Limited processing
power• Time critical
functionality
Copyright © 2010 Kenexis Security Corporation
KENEXIS
Most Vulnerabilities are Rate Dependent
SOURCE: Wurldtech Security Technologies Delphi Study
• Example:• 1000 Packets -
OK!• 10,000 Packets
– FAIL!
Copyright © 2010 Kenexis Security Corporation
KENEXIS
• Good Design can Mitigate Many Security Threats of Concern to ICS
• Good Design should be driven by:– Principle of Least Route– Usage of Managed Infrastructure– Isolation of High Risk Domains– Periodic Analysis of ICS Networks
Conclusions for ICS Networks
Copyright © 2010 Kenexis Security Corporation
RespondPlan Prepare Defend
IMPACTS OF ICS NETWORK FAULTS
Copyright © 2010 Kenexis Security Corporation
KENEXIS
• Day to Day– Application Delays– Update Errors– Poor Performance– Nuisance Trips– Failsafe Protocols– Unexplained Shutdowns
• Catastrophic– Extended Shutdowns– Dangerous Failures
Rarely is the network blamed or even assessed
Manifestations of ICS Network Failures
Copyright © 2010 Kenexis Security Corporation
KENEXIS
• These Can Be Used to Model ICS Failure Scenarios– Loss of View (LoV)– Manipulation of View (MoV)– Denial of Control (DoC)– Loss of Control (LoC)
ICS Failure Modes
Copyright © 2010 Kenexis Security Corporation
KENEXIS
• Failure in Isomerization Unit• Vapor cloud explosion• Faulty sensor and design• Failure in physical valves• Improper operator actions based
on view• Built up over 12-18 hours• $87M fine from OSHA
2005 Texas City Example
[…] Process safety did not have the same priority, at least, as commercial issues for
John, and there were important performance gaps from a management
accountability perspective concerning his actions (or inactions)," the report said.
KENEXIS
• 1994 Texaco Refinery– 18 hours buildup– LoV, DoC, LoC progressed– 30 critical alarms per minute generated at time of
explosion• Bellingham, WA Pipeline
– LoV, DoC, LoC• NE Power Outage
– LoV, DoC
Additional Examples
Copyright © 2010 Kenexis Security Corporation
KENEXIS
• Vitek Boden....
... Just Kidding
And our Favorite Example?
Copyright © 2010 Kenexis Security Corporation
KENEXIS
• TCP errors (fast retransmissions, retransmissions, and duplicate ACKs)
• TCP receive window size shrinkage• Excessive round trip time (different from TTL)• TTL (Time to Live)• Traffic rates by protocol
Observable ICS Network Conditions
Copyright © 2010 Kenexis Security Corporation
RespondPlan Prepare Defend
UNITING PROCESS PERFORMANCE AND ICS NETWORKING
Copyright © 2010 Kenexis Security Corporation
KENEXIS
Bringing It All Together
Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize
SLFR_LVL2_01A Fa0/13 2957 2182 0 5139 0
Fa0/14 0 0 0 0 0
Fa0/15 2146 1455 0 3601 0
Fa0/16 2796 2027 0 4823 0
Fa0/17 18696 24375 0 43071 0
Fa0/18 986 759 0 1745 0
Fa0/19 833 672 0 1505 0
Fa0/20 1511 1250 0 2761 0
Fa0/21 3374 3196 0 6570 0
Fa0/22 0 0 0 0 0
Fa0/23 2774 3780 0 6554 0
Fa0/24 218 302 0 520 0
THIS is how you get ROSI!
Copyright © 2010 Kenexis Security Corporation
KENEXIS
• Risk May be Systematically Reduced Through Two Actions:– Reduction in Certain events– Reduction in Uncertainty
• For Process Control– Associate ICS network to process performance– Present information to operators– Allows early warning of impending failure– Provides mechanisms to measure network and
security improvement
Reconsidering Risk Approaches
Copyright © 2010 Kenexis Security Corporation
KENEXIS
• Measure OSI Layer 1 and 2• Inspect Layer 3 and Higher in Context of ICS
Communications• Associate Network Conditions to Observable
Process Conditions
Effective ICS Network and Security Services
Copyright © 2010 Kenexis Security Corporation
KENEXIS
• Key Performance Indicators (KPI)– Measured Value KPI– Calculated KPI
• Process Intelligence Applications (MES, historians, dashboards, etc) display measures, values, and trends
Associating Process Trends
Copyright © 2010 Kenexis Security Corporation
KENEXIS
– OEE ... Unpacked: Overall Equipment Effectiveness• Measure of Performance Against Theoretical Ideal• Comprised of Three Calculated KPI
– Availability– Performance– Quality
Presentation KPI Example: OEE
Measure ValueAvailability 90%Performance 95%Quality 99.9%
The resulting OEE is (90/100)* (95/100) * (99.9) / 100 or (90 * 95 * 99.9) / 10000, which is 85.4%
Copyright © 2010 Kenexis Security Corporation
KENEXIS
Six Big Losses OEE Factor OEE ImpactBreakdowns Availability Availability
• Down Time (cumulative and event) • Real-time production mode indication • Reason Code tracking and analysis • Statistics and metrics are real-time automated • Operators can focus on getting equipment running
Setup and Adjustments
Availability • Setup Time (cumulative and event) • Set goals for Setup Time reduction programs
Small Stops Performance Performance • Average Cycle Time • Small Stops (occurrences and time) • Configurable Small Stop Threshold • Cycle Time Trace records every cycle • Identify when and how time is lost to Small Stops
Reduced Speed Performance • Reduced Speed (occurrences and time) • Configurable Reduced Speed Threshold • Cycle Time Trace records every cycle • Identify Reduced Speed patterns
Startup Rejects Quality • Reject Pieces (during Startup) • Percent Reject Pieces (during Startup)
Production Rejects
Quality Quality • Reject Pieces (during Production) • Percent Reject Pieces (during Production)
Relating ICS Network to OEE
Copyright © 2010 Kenexis Security Corporation
KENEXIS
Measuring Specific ICS Events
• SPAN Ports• Active Taps• Sniffers• Capture Filters• SNORT rules
Copyright © 2010 Kenexis Security Corporation
KENEXIS
• Communications a Variable
• Many Causes Can Create Momentary Network Glitches that are NOT Harmful
Importance of Analyzing Over Time
Copyright © 2010 Kenexis Security Corporation
KENEXIS
• SPC for ICS Networks• DSP could be used for
further reduction in false positives
Reducing False Positives
High-Low
High-High
Copyright © 2010 Kenexis Security Corporation
KENEXIS
• Time Synchronization CRITICAL
• Provide Information on Operator Screens
• Associate Network Conditions to Process Conditions
Associate ICS Events to Process Performance
Copyright © 2010 Kenexis Security Corporation
KENEXIS
• PCAPs of the event are recorded at time of event
• No need to "wait" for condition to reappear
• Advanced remote support and diagnostics are possible without touching a running process
• Recording in a separate system on a out of band network channel enables forensics of the actual event
What To Do With The Data?
Copyright © 2010 Kenexis Security Corporation
KENEXIS
Availability Quality Performance
OEE
Pre 96% 97.5% 92.33% 86.42%Plan 97.1% 97.8% 94.3% 89.55%Actual 97.08% 97.72% 94.01% 89.18%
+2.76% $99,474.77
ICS Network to OEE Case Study
Given: 1 Plant with 4 Lines, and a $12,000 per +/- OEE point, per line
Copyright © 2010 Kenexis Security Corporation
KENEXIS
• Ability to capture all the required data– High quality network cards– Fast processor and memory– Use tcpdump or similar– Capture with active taps or SPAN
• Capture meaningful data– Various tap points for collection - not
just core switches and paths– Network captures MUST be time
synchronized to datastores (i.e. historian)
Critical to Success
KENEXIS
• Today, justifying security remains a challenge
• There is a significant gap in understanding between process performance and network performance
• Security and network failures manifest in measurable ways
• Associating network failures to process conditions is an effective and viable activity
• Key is to determine causal factors and clear KPIs
Summary
Copyright © 2010 Kenexis Security Corporation