Upload
adah
View
38
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Building Defensive Architectures Using Backdoors. Liviu Iftode Department of Computer Science Rutgers University. Frustration Scalability. Service.com. Internet. Attacks. Failure. 11:00am JST. 9:00pm EST. 2:00am GMT. Planetary-Scale Services. - PowerPoint PPT Presentation
Citation preview
Building Defensive Building Defensive Architectures Using Architectures Using
BackdoorsBackdoors
Liviu IftodeLiviu Iftode
Department of Computer Department of Computer ScienceScience
Rutgers UniversityRutgers University
Frustration ScalabilityFrustration Scalability
Service.com
Planetary-Scale ServicesPlanetary-Scale Services
Human operators, phone calls and emails hard to scaleHuman operators, phone calls and emails hard to scale Cost of ownership dramatically exceeds cost of systemsCost of ownership dramatically exceeds cost of systems
InternetInternet
Failure
Attacks
9:00pm EST 2:00am GMT 11:00am JST
The Dream: A Defensive The Dream: A Defensive ArchitectureArchitecture
InternetInternet
Failure
Attacks
9:00pm EST 2:00am GMT 11:00am JST
Gateway
BD
Gateway
BD BD BD
Gateway
BD BD BD BD BD
Private Network
Possible Healing Possible Healing ActionsActions
Refresh the state (reboot)Refresh the state (reboot) Destructive and DisruptiveDestructive and Disruptive
Repair the state (continue)Repair the state (continue) Recover the state (transfer)Recover the state (transfer)
How to access the memory of the failed system when the OS is “hung”?
The Motivating The Motivating PhilosophyPhilosophy
Something is better than nothingSomething is better than nothing Save application state if possibleSave application state if possible
Faster is better than slower Faster is better than slower Repairing state faster than repairing softwareRepairing state faster than repairing software
It is hard to corrupt or stop an outsiderIt is hard to corrupt or stop an outsider Remote healing better than self-healingRemote healing better than self-healing
Attackers and faults are becoming Attackers and faults are becoming “smarter”“smarter” Try “holistic” approach if nothing else Try “holistic” approach if nothing else
The Backdoor (BD)The Backdoor (BD)
Backdoor: a hidden software or hardware mechanism, usually created for testing and troubleshooting
--American National Standard for Telecommunications
Backdoor Design Backdoor Design PrinciplesPrinciples
1. 1. AvailabilityAvailability BD must be highly available (even when OS is BD must be highly available (even when OS is
not)not)
2. Non-intrusiveness2. Non-intrusiveness BD operations must not involve local OS (zero-BD operations must not involve local OS (zero-
overhead monitoring)overhead monitoring)
3. Integrity3. Integrity OS cannot alter BD execution or modify the OS cannot alter BD execution or modify the
result of a BD operationresult of a BD operation
4. Responsiveness4. Responsiveness A BD operation cannot be delayed indefinitelyA BD operation cannot be delayed indefinitely
Possible Backdoor Possible Backdoor ImplementationsImplementations
A programmable network interface A programmable network interface (I-NIC) (I-NIC) Our current prototype is on MyrinetOur current prototype is on Myrinet
A virtual machine over a VMMA virtual machine over a VMM Work in progress over Xen Work in progress over Xen
IBM’s Remote Supervisor Adapter?IBM’s Remote Supervisor Adapter? HP’s Remote Management Adapter?HP’s Remote Management Adapter?
Backdoor as building Backdoor as building blockblock
Remote Healing SystemsRemote Healing Systems A computer system A computer system
monitors/repairs/recovers the state of a monitors/repairs/recovers the state of a remote system through the backdoorremote system through the backdoor
Backdoor is controlled by the remote OSBackdoor is controlled by the remote OS Defensive ArchitecturesDefensive Architectures
Backdoors are programmed to execute Backdoors are programmed to execute defensive tasks, stand-alone or defensive tasks, stand-alone or cooperatively over a private networkcooperatively over a private network
Standalone backdoorStandalone backdoor
OutlineOutline
IntroductionIntroduction Backdoor IdeaBackdoor Idea Remote HealingRemote Healing Defensive ArchitecturesDefensive Architectures ConclusionsConclusions
Remote HealingRemote Healing
Backdoor prototyped on I-NIC Backdoor prototyped on I-NIC (Myrinet)(Myrinet)
Remote Repair of OS StateRemote Repair of OS State Remote Recovery for Cluster-Based Remote Recovery for Cluster-Based
Internet ServersInternet Servers
Backdoor on I-NIC Backdoor on I-NIC
Mem NICCPU
I-NIC
Backdoor
Private Network
“Front door”
Backdoor provides an alternative access to Backdoor provides an alternative access to system memory without involving local CPU/OSsystem memory without involving local CPU/OS
Private network over a specialized interconnect, Private network over a specialized interconnect, VPN, or even over a phone link!VPN, or even over a phone link!
A Remote Healing A Remote Healing ArchitectureArchitecture
Mem
I/O
CPU
BD
Target System
BD
Monitor System
Mem
I/O
CPU
Backdoors use Remote Backdoors use Remote Memory CommunicationMemory Communication
NIC CPU
CPU
Memory
BD
CPU
Memory
BD
MonitorTargetMONITOR(Remote-
Read)
Recovery/Repair
(Remote-Read/Write)
Remote OS LockingRemote OS Locking
Implemented by a BD-OS protocolImplemented by a BD-OS protocol Two functionsTwo functions
Provides exclusive access to target OS Provides exclusive access to target OS data for state repairingdata for state repairing
Enforces fail-stop model in the recovery Enforces fail-stop model in the recovery case to avoid the consequences of false case to avoid the consequences of false positives in failure detectionpositives in failure detection
Can be avoided?Can be avoided? Yes for monitoringYes for monitoring
Monitoring and Failure DetectionMonitoring and Failure Detection Sensor BoxSensor Box: system health indicators (sensors) : system health indicators (sensors)
provided by the target OS in its local memoryprovided by the target OS in its local memory Sensors: Sensors: <UniqueID, Type, Threshold , Value<UniqueID, Type, Threshold , Value>>
RepairingRepairing Externalized StateExternalized State: OS state data that the BD : OS state data that the BD
can read can read Remote Access HooksRemote Access Hooks: OS control data that : OS control data that
the BD can write to perform repairing actionsthe BD can write to perform repairing actions RecoveryRecovery
Continuation BoxContinuation Box: fine-grain OS and : fine-grain OS and application checkpoint state that the BD can application checkpoint state that the BD can transfer between systems to migrate running transfer between systems to migrate running applicationsapplications
OS Support for Remote OS Support for Remote HealingHealing
Sensor Box (SB)Sensor Box (SB) Collection of health indicators Collection of health indicators
(sensors) in the target OS memory(sensors) in the target OS memory <ID, Type, Threshold, Value><ID, Type, Threshold, Value>
Sensor Type Threshold
Progress Update deadline
Level Max/Min value
Pressure Max number of events
Target OS Monitor
Sensor Box
Target OS updates progress sensors in SB continuouslyTarget OS updates progress sensors in SB continuously Monitoring thread reads SB periodically and checks Monitoring thread reads SB periodically and checks
counterscounters Failure = counter stalled beyond its deadlineFailure = counter stalled beyond its deadline
False positive rate vs. detection latency tradeoffFalse positive rate vs. detection latency tradeoff
Backdoor
<Timer interrupts><Context switches><NIC interrupts>
…
Failure Detection using Failure Detection using Sensor BoxSensor Box
Monitoring and Monitoring and Detection Using BDDetection Using BD
CPU
Mem
BD
CPU
Mem
BD
Sensor Box
Remote view
Detection
Diagnosis and RepairigDiagnosis and Repairig Diagnosis Diagnosis
InspectInspect live OS data structures in target’s live OS data structures in target’s memory (through the externalized state)memory (through the externalized state)
IdentifyIdentify damaged OS state (e.g. resource damaged OS state (e.g. resource exhaustion due to memory hogging exhaustion due to memory hogging processes)processes)
RepairingRepairing ModifyModify target OS memory (through remote target OS memory (through remote
access hooks) to access hooks) to correct damaged statecorrect damaged state (e.g. (e.g. remove memory hogging processes by remove memory hogging processes by “injecting” a kill signal in its process control “injecting” a kill signal in its process control block)block)
Diagnosis Using BDDiagnosis Using BD
CPU
Mem
BD
CPU
Mem
BD
Externalized
state
Fine grained view
Diagnosis
Mem
RepairHook
Repair
Repair Using BDRepair Using BD
CPU
Mem
BD
CPU
BD
Correct state
Case Study: Repairing OS Case Study: Repairing OS StateState
Damaged OS state : resource Damaged OS state : resource exhaustion, corrupted data exhaustion, corrupted data structures, compromised OS, etc.structures, compromised OS, etc.
Resource exhaustionResource exhaustion Attack, overload, system Attack, overload, system
misconfiguration, programming errormisconfiguration, programming error Repairing cannot rely on local Repairing cannot rely on local
resourcesresources Two examplesTwo examples
Fork bomb Fork bomb Memory hog Memory hog
Case Study : Memory Case Study : Memory HogHog
Program allocates memory in an Program allocates memory in an infinite loopinfinite loop
Both memory and swap space are Both memory and swap space are occupied by the memory hogoccupied by the memory hog
System is inaccessible from console System is inaccessible from console or the networkor the network Cannot spawn new processesCannot spawn new processes Cannot handle interruptsCannot handle interrupts Local daemons cannot repair systemLocal daemons cannot repair system
Remote Repairing in Remote Repairing in case of Memory case of Memory
HoggingHogging MonitoringMonitoring Pressure sensor signals when severe low Pressure sensor signals when severe low
memory condition is detected memory condition is detected DiagnosisDiagnosis
Target externalizes process table and Target externalizes process table and process memory usage statisticsprocess memory usage statistics
Monitoring thread identifies the culpritMonitoring thread identifies the culprit RepairingRepairing
Monitoring thread kills culprit by remotely Monitoring thread kills culprit by remotely posting a SIGKILL posting a SIGKILL
PrototypePrototype
BD implemented on Myrinet LanaiX BD implemented on Myrinet LanaiX NIC NIC Modified firmware and low level GM libraryModified firmware and low level GM library
Modified FreeBSD 4.8 kernelModified FreeBSD 4.8 kernel Experimental setupExperimental setup
Dell Poweredge 2600 servers with 2.4 GHz Dell Poweredge 2600 servers with 2.4 GHz dual Intel Xeon, 1GB RAM, 2GB swap, dual Intel Xeon, 1GB RAM, 2GB swap, Myrinet Lanai X NICMyrinet Lanai X NIC
Benchmark: simple counting program with Benchmark: simple counting program with fixed number of iterationsfixed number of iterations
Effectiveness of Remote Effectiveness of Remote RepairingRepairing
0
5
10
15
20
0 2 4 6 8 10 12 14 16
Number of memory hog processes
Exe
cutio
n tim
e (s
)
Impaired system
With remote repair
Repairing Timeline Repairing Timeline
0 0.5 1 1.5 2 2.5 3
Time (s)
Memory pressure
Detection
Diagnosis & Repair
End of repair
Local cleanup of damaged stateRemote Repair
Remote HealingRemote Healing
Backdoor prototype using MyrinetBackdoor prototype using Myrinet Remote Repair of OS StateRemote Repair of OS State Remote Recovery for Cluster-based Remote Recovery for Cluster-based
Internet ServersInternet Servers
Clusters with BD Clusters with BD NetworkNetwork
PM I/O
BD
PM I/O
BD
PM I/O
BD
PM I/O
BD
InterconnectM
T M
M T
TT M
Cluster-based Internet Cluster-based Internet Services with BD networkServices with BD network
Server
Monitor
Server
Monitor
Server
Monitor
Client Client Client
Cluster-based Internet Cluster-based Internet Services with BD networkServices with BD network
Server
Monitor
Server
Monitor
Server
Monitor
Client Client Client
Continuation Box (CB)Continuation Box (CB) IdeaIdea
Define per client-session state (OS and Define per client-session state (OS and application)application)
Transfer client sessions from the failed system Transfer client sessions from the failed system to other systems in the cluster running the to other systems in the cluster running the same server applicationsame server application
CB encapsulates the state of a client CB encapsulates the state of a client session associated with a server session associated with a server application (possibly multi-process)application (possibly multi-process) OS state (data in transit through IPC OS state (data in transit through IPC
channels)channels) application-specific state (periodically application-specific state (periodically
exported/checkpointed by the application)exported/checkpointed by the application)
Continuation Box Continuation Box ExtractionExtraction
Memory
BD
CPU
BD
Victim machine(crashed)
Recovery machine(healthy)
Memory
Continuation Box
Recovered State
OS
Client-Session Continuation Client-Session Continuation BoxBox
for Multi-Process Serversfor Multi-Process Servers
Client 1
CB2
CB1
TCP/IP IPC
App. stateComm. state
Process 1
Process 2
Client 2
Continuation Box APIContinuation Box API
create_cbcreate_cb for a client session for a client session export export application state to CBapplication state to CB associateassociate I/O channel with the CB I/O channel with the CB open_cbopen_cb given an I/O channel given an I/O channel importimport application state from CB application state from CB
Changes to make Server Changes to make Server RecoverableRecoverable
while (cid = accept()) { cbid = create_cb(cid) if (import(cbid, &{file_name, offset}) == NULL) { receive(cid, file_name) offset = 0 } fd=open(file_name) seek(fd, offset) while (read(fd, block, size) != EOF) { send(cid, block, size) offset += size export(cbid, {file_name, offset}) }}
State Synchronization State Synchronization ProblemProblem
Application state (SB_APP) updated only upon exportApplication state (SB_APP) updated only upon export OS state (SB_IO) updated continuously by the OS OS state (SB_IO) updated continuously by the OS
kernelkernel How to synchronize the two components of the CB?How to synchronize the two components of the CB?
A1
A13 2
OS
Application
export
SB_IO SB_APP
SB
A1
A13
OS
Application
import
SB_IO SB_APP
SB
A2
A13 2
OS
Application
SB_IO SB_APP
SB
REC
V
CB-based RecoveryCB-based Recovery Log-based rollback recoveryLog-based rollback recovery
restores server state with respect to a clientrestores server state with respect to a client OS keeps communication logs OS keeps communication logs
(send/receive) (send/receive) 0-copy using the communication buffers0-copy using the communication buffers
After migration, OS replays After migration, OS replays send/receive operations from logs send/receive operations from logs transparent to server and client applicationstransparent to server and client applications
Backdoors PrototypeBackdoors Prototype Myrinet LanaiX NIC as backdoorMyrinet LanaiX NIC as backdoor
in-kernel remote read/write operationsin-kernel remote read/write operations Modified FreeBSD kernelModified FreeBSD kernel
Sensor Box, Continuation BoxSensor Box, Continuation Box Modified server applicationsModified server applications
Apache, Flash, Icecast, JBossApache, Flash, Icecast, JBoss
Case Study: A Multi-tier Case Study: A Multi-tier Auction ServiceAuction Service
Back-End
MySQL DB server
Front-End (FE)
Apache web server
Middle Tier (MT)
JBoss app. server
Recoverable RUBiSRecoverable RUBiS
Experimental Experimental EvaluationEvaluation
Experimental setupExperimental setup Dell PowerEdge 2600 servers, 2.4 GHz Dell PowerEdge 2600 servers, 2.4 GHz
dual Intel Xeon, 1GB RAM, 1Gb Ethernetdual Intel Xeon, 1GB RAM, 1Gb Ethernet Workload modeled after TPC-WWorkload modeled after TPC-W
Fault injection in FE and MT nodesFault injection in FE and MT nodes synthetic freeze, emulated freeze by synthetic freeze, emulated freeze by
remote OS locking, bugs inserted in remote OS locking, bugs inserted in network drivers network drivers
EvaluationEvaluation Low overhead under loadLow overhead under load Recovery is fastRecovery is fast
Low Overhead under Low Overhead under LoadLoad
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
20 100 300 500 700 900 1,100
Clients
Req
ues
ts/m
in
Base
Recoverable FE
Recoverable FE+MT
Recovery is FastRecovery is Fast
0 5 10 15 20 25 30
Time (ms)
Failure
Detection
Import CB
Recoveryends
Recovery latencyDetection Latency
OutlineOutline
IntroductionIntroduction Backdoor IdeaBackdoor Idea Remote Healing ExperienceRemote Healing Experience Defensive ArchitecturesDefensive Architectures ConclusionsConclusions
Autonomous BackdoorAutonomous Backdoor
BD is programmed to execute defensive tasks, then BD is programmed to execute defensive tasks, then “sealed”“sealed”
Defensive Architecture Defensive Architecture HierarchyHierarchy
Defensive Computer Architecture (DCA)Defensive Computer Architecture (DCA) Individual computers equipped with BDIndividual computers equipped with BD BD performs local defensive tasks (e.g. OS state BD performs local defensive tasks (e.g. OS state
inspection)inspection) Defensive Network Architecture (DNA)Defensive Network Architecture (DNA)
Cluster nodes equipped with BDs connected over Cluster nodes equipped with BDs connected over high-speed private networkhigh-speed private network
BDs perform defensive tasks cooperatively (e.g. OS BDs perform defensive tasks cooperatively (e.g. OS integrity checking, continuous remote logging)integrity checking, continuous remote logging)
Defensive Inter-Network Architectures (DINA)Defensive Inter-Network Architectures (DINA) Loosely coupled DNAs connected over the Internet Loosely coupled DNAs connected over the Internet
or other networksor other networks DNA cooperate (e.g. early warnings of virus attacks)DNA cooperate (e.g. early warnings of virus attacks)
Local Memory InspectionLocal Memory Inspection(Work in Progress)(Work in Progress)
Kernel Integrity Monitoring & HealingKernel Integrity Monitoring & Healing Search for kernel rootkitsSearch for kernel rootkits
individual kernel functions individual kernel functions kernel tables e.g. syscallkernel tables e.g. syscall dynamic structures e.g. the process table, etcdynamic structures e.g. the process table, etc
Repair the kernel when compromised Repair the kernel when compromised Replace tampered tables with clean versions.Replace tampered tables with clean versions. Replace corrupt versions of kernel functions Replace corrupt versions of kernel functions
with clean ones.with clean ones. Holistic Approach to System Failure Holistic Approach to System Failure
PredictionPrediction Identify kernel memory update patterns Identify kernel memory update patterns
and correlate them to predict unstable and correlate them to predict unstable system statessystem states
Defensive Inter-Network Defensive Inter-Network Architecture over PlanetLab Architecture over PlanetLab
(new project)(new project)
InternetInternet
Failure
Attacks
9:00pm EST 2:00am GMT 11:00am JST
Gateway
BD
Gateway
BD BD BD
Gateway
BD BD BD BD BD
Private Network
Related WorkRelated Work DEC WRL Titan system [’86]DEC WRL Titan system [’86] Recoverable OS subsystems Recoverable OS subsystems
Rio reliable file cache [Chen ‘96]Rio reliable file cache [Chen ‘96] Recovery Box [Baker ‘92]Recovery Box [Baker ‘92]
Defensive Programming [Qie ‘03]Defensive Programming [Qie ‘03] Nooks [Swift ’04]Nooks [Swift ’04] Recovery Oriented Computing [Patterson’02]Recovery Oriented Computing [Patterson’02] Microreboot [Candea’04]Microreboot [Candea’04] TCP Connection Failover[Snoeren’01, Sultan’01, TCP Connection Failover[Snoeren’01, Sultan’01,
Alvisi’01, Koch’03, Mishra’03, Zagorodnov’03]Alvisi’01, Koch’03, Mishra’03, Zagorodnov’03] Automatic repair of data structures [Demski ‘03]Automatic repair of data structures [Demski ‘03] K42 [Soules ’03] K42 [Soules ’03] Hypervisor-based fault tolerance [Bressoud ‘95]Hypervisor-based fault tolerance [Bressoud ‘95]
ConclusionsConclusions
Backdoor is a promising building block Backdoor is a promising building block for remote healing and defensive for remote healing and defensive architectures architectures
Feasibility studies for Remote Repairing Feasibility studies for Remote Repairing and Remote Recovery using I-NIC-based and Remote Recovery using I-NIC-based Backdoor prototypeBackdoor prototype
Current work on Defensive Current work on Defensive ArchitecturesArchitectures
People and Money Behind People and Money Behind BackdoorsBackdoors
Florin SultanFlorin Sultan Aniruddha Bohra Aniruddha Bohra Pascal Gallard (INRIA/IRISA, France)Pascal Gallard (INRIA/IRISA, France) Iulian Neamtiu (University of Iulian Neamtiu (University of
Maryland)Maryland) Stephen SmaldoneStephen Smaldone Yufei PanYufei Pan Arati BaligaArati Baliga Tzvika ChumashTzvika Chumash
NSF CAREER CCR-0133366NSF CAREER CCR-0133366
Thank You!Thank You!
http://discolab.rutgers.edu/http://discolab.rutgers.edu/bdabda
Yes, BD Security! (work in Yes, BD Security! (work in progress)progress)
BD under OS controlBD under OS control Access to remote memory controlled through Access to remote memory controlled through
memory registration (established at the memory registration (established at the initialization time)initialization time)
Voting scheme for remote writes (delayed Voting scheme for remote writes (delayed writes)writes)
BDs monitor each other and their OSes BDs monitor each other and their OSes integrityintegrity
Autonomous BDAutonomous BD OS cannot access BD memory after OS cannot access BD memory after
initialization (possible with PCI Express)initialization (possible with PCI Express)