Upload
ryancox
View
890
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
Computer Science
SecureMR - Practical Hadoop Security
Triangle Hadoop Users Group September 14th, 2010
•1/32
Computer Science
SecureMR - Overview Long-term Goal
o Deploy MapReduce over open systems with security guarantee
Motivationo Industry
Google, Yahoo!, Facebooko Academia
Machine Learning, Data Intensive Computation, Image Processing
Our Focuso Provide integrity assurance for MapReduce in open systems
Basic Ideao Adopt a replication-based schemeo Decentralize integrity verification
•2/32
Computer Science
Outline
IntroductionSystem ModelSystem DesignAnalysis and EvaluationRelated WorkConclusion
•3/32
Computer Science
MapReduce Overview
•… …
Reduce Phase
•DFS•…
…
•Map Phase
•Master
•M2
•R1
• Inp
ut
P1... …
Pr
B2
… …
Bn
B1 •M1
Local Write•Read from
DFS
P1… …
Pr
P1… …
Pr
Assign
MapTask•Assign ReduceTask
Remote ReadOutput 1
Output r
Write to DFS
•… …
Intermediate Result
•DFS
•4/32
•Rr
•Reducer•Mapper
•Mn
Computer Science
MapReduce – WordCount Application
Hello World, Bye World!
Hello MapReduce, Goodbye to MapReduce.
Welcome to ACSAC, Goodbye
to ACSAC.
Reduce Phase
•DFS •Map Phase
Intermediate Result
•DFS
•M1
•M2
•M3
(Hello, 1)
(Bye, 1)
(World, 1)
(World, 1)
(Welcome, 1)
(to, 1)
(to, 1)
(ACSAC, 1)
(Goodbye, 1)
(ACSAC, 1)
(Hello, 1)
(to, 1)(MapReduce, 1)
(Goodbye, 1)
(MapReduce, 1)
R1
R2
(Hello, 2)
(Bye, 1)
(Welcome, 1)
(to, 3)
(World, 2)
(ACSAC, 2)
(Goodbye, 2)
(MapReduce, 2)
•5/32
Computer Science
Outline
IntroductionSystem ModelSystem DesignAnalysis and EvaluationRelated WorkConclusion
•6/32
Computer Science
System Model
Goalo Deploy MapReduce over open systems with integrity assurance
Open system is different from closed system
Attacks against MapReduce in open systemso Communication attacks
Eavesdropping, DoS and replay attackso Data processing service integrity attacks
Insert fake data, tamper data and drop data
•7/32
(Our Focus)
Computer Science
System Model – Integrity Attacks
•… …
Reduce Phase
•DFS•…
…
•Map Phase
•Master
• Inp
ut
P1... …
Pr
B2
… …
Bn
B1
P1… …
Pr
P1… …
Pr
Output 1
Output r
•… …
Intermediate Result
•DFS
•8/32
•M2
•R1•M1
•Rr
•Mn
Computer Science
System Model
Assumptionso PKI is deployed in advanceo Master is trustedo DFS provides data integrity protection [Atallah, et al., ICDE’08]
Attack Modelso Non-collusive malicious behavioro Collusive malicious behavior
•9/32
Computer Science
Outline
IntroductionSystem ModelSystem DesignAnalysis and EvaluationRelated WorkConclusion
•10/32
Computer Science
SecureMR
Basic Ideao Adopt a replication-based scheme (integrity)
•11/32
Computer Science
A Naive ApproachB
1B
2B
3B
4
•Rea
d
P1
P2
… …
Pr
P1
P2
… …
Pr
Send results to master
Send results to master
H
P
1
…H
P
1
…== ???
Send intermediate result to reducer
•Process
… …
Bn
•Ma
•Mb
•Ra
•12/32
•Rb
•Process
Scalability?Scalability? Integrity?Integrity?
Computer Science
P1
P2
… …
Pr
A Naive ApproachB
1B
2B
3B
4
•Rea
d
P1
P2
… …
Pr
Send results to master
Send results to master
H
P
1
H
P
2
…H
P
1
H
P
2
…H==
… …
Bn
•Ma
•Mb
•Ra
•13/32
•Rb
Computer Science
P1
P2
… …
Pr
A Naive Approach
•Rea
d
P1
P2
… …
Pr
Send results to master
Send results to master
B1
B2
B3
B4
… …
Bn
•Ma
•Mb
•Ra
Send tampered result to reducer
H
P
1
H
P
2
…H
P
1
H
P
2
…H==
Output 1
•14/32
•Rb
Output 1
==
Computer Science
SecureMR
Basic Ideao Adopt a replication-based scheme (integrity)o Decentralize integrity verification (scalability & integrity)
Design Goalso Security
o Non-repudiation, resilience to DoS and replay attackso Performance
o Minimize computation cost and network communicationso Applicability
o Preserve existing protocol as much as possible
•15/32
Computer Science
SecureMR – Architecture Design
•MapReduce
Open Systems
Grid Computing, Volunteer Computing and P2P Computing
Network Infrastructure
User Applications
•Reducer
Task Executor
•Master
Scheduler
•Mapper
Task Executor
•16/32
Computer Science
SecureMR – Architecture Design
•SecureMR
Open Systems
Grid Computing, Volunteer Computing and P2P Computing
Network Infrastructure
User Applications
•Reducer
Secure
Task Executor
Secure
Verifier
•Master
Secure
Scheduler
Secure
Manager
•Mapper
Secure
Task Executor
Secure
Committer
•17/32
Computer Science
SecureMR – Communication Design•…
…
•Reduce Phase
• B1
• B2
• … …
• Bn
•DFS
• 2. R
ead
•7. Notify
•… …
•Map Phase
•5. Compare
•1.1. Assi
gn
•8. Request
•9. Response
•10. Verify
• 3. P
roce
ss
•Master
•4. Commit
•1.2. Assi
gn
•6. Assign
• Inp
ut
•18/32
•M2
•R1
•M1
•Rr
•Reducer•Mapper
•Mn
•Commitment
•Verification
Computer Science
SecureMR – Commitment Protocol
P1
P2
… …
Pr
HP1
HP2
…
HPr
{Hr}sig
P1
P2
… …
Pr
HP1
HP2
…
HPr
{Hr}sig
Send hashes
Send hashes
H
P
1
H
P
2
… {H}
s
i
g
H
P
1
… {H}
s
i
g
==
B1
B2
B3
B4
… …
Bn
•Ma
•Mb
•Rea
d
•19/32
Computer Science
SecureMR – Verification Protocol
P1
P2
… …
Pr
HP1
HP2
…
HPr
{Hr}sig
P1
P2
… …
Pr
HP1
HP2
…
HPr
{Hr}sig
Send hashes
Send hashes
Notify & {HP1}sig
Read & Calculate H’P1
HP1 == H’P1?
… …
… …
Notify &
{HP
r }sig
Read & Calculate H’Pr
HPr == H’Pr?
B1
B2
B3
B4
… …
Bn
•Rea
d •Ma
•Mb
•R1
•Rr
•20/32
Computer Science
SecureMR – Verification Protocol
P1
P2
… …
Pr
HP1
HP2
…
HPr
{Hr}sig
P1
P2
… …
Pr
HP1
HP2
…
HPr
{Hr}sig
Send hashes
Send hashes
Notify & {HP1}sig
Read & Calculate H’P1
HP1 == H’P1
B1
B2
B3
B4
… …
Bn
•Rea
d •Ma
•Mb
•R1
•21/32
Computer Science
MapReduce in Open Systems – Integrity
•… …
Reduce Phase
•DFS•…
…
•Map Phase
•Master
• Inp
ut
P1... …
Pr
B2
… …
Bn
B1
Local Write•Read from
DFS
P1… …
Pr
P1… …
Pr
Assign
MapTask•Assign ReduceTask
Remote ReadOutput 1
Output r
Write to DFS
•… …
Intermediate Result
•DFS
•22/32
•M2
•R1•M1
•Rr
•Reducer•Mapper
•Mn
Computer Science
Outline
IntroductionSystem ModelSystem DesignAnalysis and EvaluationRelated WorkConclusion
•23/32
Computer Science
SecureMR – Analysis
Security Analysiso No false alarmo Non-repudiation
Attacker Behavior Analysiso Periodical attackers without collusion (Detection Rate)o Periodical attackers with collusion (Detection Rate)o Strategic attackers (Misbehaving Probability)
Detection Rateo We define the detection rate, denoted Drate, as the probability that the
inconsistency between results caused by the misbehavior is detected during l jobs.
•24/32
Computer Science
SecureMR – Analysis
•25/32
Detection Rate for Collusive Periodical Attacker
• # of works n = 50• misbehaving probability pm = 0.5• # of blocks b = 20• # of jobs l = 15
• pb – duplication rate• m – # of malicious workers
Computer Science
SecureMR – Evaluation
System Implementationo Implementation based on Hadoopo Two scheduling algorithms for comparisons
Naive task scheduling algorithm Commitment-based task scheduling algorithm
o Non-blocking Consistency verification
Experiment Setupo 14 hosts in Virtual Computing Lab (VCL)o 2.66GHz Intel Intel(R) Core(TM) 2 Duoo Ubuntu Linux 8.04, Sun JDK 6 and Hadoop 0.19o Hadoop WordCount application
•26/32
Computer Science
SecureMR – Evaluation
•27/32
• # of map tasks = 60• # of reduce tasks = 25• size of input data = 1GB
Response Timeo We define the response time as the time to finish map and reduce tasks
in a job.
Response Time vs Duplication Rate
Computer Science
Outline
IntroductionSystem ModelSystem DesignAnalysis and EvaluationRelated WorkConclusion
•28/32
Computer Science
Related Work
Research related to MapReduce Machine Learning [Cheng, et al., NIPS 2006] Data Intensive Computing [Ekanayake, et al., eScience 2008] Semantic Annotation [Laclav´ık, et al., ICCS 2008]o Few attention paied to the integrity protection in MapReduce
Related techniques Sampling for uncheatable grid computing [Du, et al., ICDCS 2004] Quiz for result verification [Zhao, et al., P2P 2005] Majority voting and sport-checking [Sarmenta, et al., FGCS 2002]o None of them addressed unique challenges like massive data
processing and multi-party distributed computation Research on system security
Securing publish-subscribe services [Srivatsa, et al., CCS 2005] Peerreview in distributed systems [Haeberlen, et al., SOSP 2007]o SecureMR focuses on a different domain
•29/32
Computer Science
Outline
IntroductionSystem ModelSystem DesignAnalysis and EvaluationRelated WorkConclusion
•30/32
Computer Science
Conclusion
To the best of our knowledge, our work makes the first attempt to address this problem.
Contributionso A decentralized replication-based integrity verification schemeo A prototype of SecureMRo Analytical study and experimental evaluation of performance overhead
Future Worko Explore other techniques to address collusion attacko Provide data quality assurance for final result
•31/32
Computer Science
•Thank you•Questions?
•32/32