32
Computer Science SecureMR - Practical Hadoop Security Triangle Hadoop Users Group September 14 th , 2010 1/32

Tri hug 2010 wei

  • Upload
    ryancox

  • View
    890

  • Download
    2

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Tri hug 2010   wei

Computer Science

SecureMR - Practical Hadoop Security

Triangle Hadoop Users Group September 14th, 2010

•1/32

Page 2: Tri hug 2010   wei

Computer Science

SecureMR - Overview Long-term Goal

o Deploy MapReduce over open systems with security guarantee

Motivationo Industry

Google, Yahoo!, Facebooko Academia

Machine Learning, Data Intensive Computation, Image Processing

Our Focuso Provide integrity assurance for MapReduce in open systems

Basic Ideao Adopt a replication-based schemeo Decentralize integrity verification

•2/32

Page 3: Tri hug 2010   wei

Computer Science

Outline

IntroductionSystem ModelSystem DesignAnalysis and EvaluationRelated WorkConclusion

•3/32

Page 4: Tri hug 2010   wei

Computer Science

MapReduce Overview

•… …

Reduce Phase

•DFS•…

•Map Phase

•Master

•M2

•R1

• Inp

ut

P1... …

Pr

B2

… …

Bn

B1 •M1

Local Write•Read from

DFS

P1… …

Pr

P1… …

Pr

Assign

MapTask•Assign ReduceTask

Remote ReadOutput 1

Output r

Write to DFS

•… …

Intermediate Result

•DFS

•4/32

•Rr

•Reducer•Mapper

•Mn

Page 5: Tri hug 2010   wei

Computer Science

MapReduce – WordCount Application

Hello World, Bye World! 

Hello MapReduce, Goodbye to MapReduce.

Welcome to ACSAC, Goodbye

to ACSAC.

Reduce Phase

•DFS •Map Phase

Intermediate Result

•DFS

•M1

•M2

•M3

(Hello, 1)

(Bye, 1)

(World, 1)

(World, 1)

(Welcome, 1)

(to, 1)

(to, 1)

(ACSAC, 1)

(Goodbye, 1)

(ACSAC, 1)

(Hello, 1)

(to, 1)(MapReduce, 1)

(Goodbye, 1)

(MapReduce, 1)

R1

R2

(Hello, 2)

(Bye, 1)

(Welcome, 1)

(to, 3)

(World, 2)

(ACSAC, 2)

(Goodbye, 2)

(MapReduce, 2)

•5/32

Page 6: Tri hug 2010   wei

Computer Science

Outline

IntroductionSystem ModelSystem DesignAnalysis and EvaluationRelated WorkConclusion

•6/32

Page 7: Tri hug 2010   wei

Computer Science

System Model

Goalo Deploy MapReduce over open systems with integrity assurance

Open system is different from closed system

Attacks against MapReduce in open systemso Communication attacks

Eavesdropping, DoS and replay attackso Data processing service integrity attacks

Insert fake data, tamper data and drop data

•7/32

(Our Focus)

Page 8: Tri hug 2010   wei

Computer Science

System Model – Integrity Attacks

•… …

Reduce Phase

•DFS•…

•Map Phase

•Master

• Inp

ut

P1... …

Pr

B2

… …

Bn

B1

P1… …

Pr

P1… …

Pr

Output 1

Output r

•… …

Intermediate Result

•DFS

•8/32

•M2

•R1•M1

•Rr

•Mn

Page 9: Tri hug 2010   wei

Computer Science

System Model

Assumptionso PKI is deployed in advanceo Master is trustedo DFS provides data integrity protection [Atallah, et al., ICDE’08]

Attack Modelso Non-collusive malicious behavioro Collusive malicious behavior

•9/32

Page 10: Tri hug 2010   wei

Computer Science

Outline

IntroductionSystem ModelSystem DesignAnalysis and EvaluationRelated WorkConclusion

•10/32

Page 11: Tri hug 2010   wei

Computer Science

SecureMR

Basic Ideao Adopt a replication-based scheme (integrity)

•11/32

Page 12: Tri hug 2010   wei

Computer Science

A Naive ApproachB

1B

2B

3B

4

•Rea

d

P1

P2

… …

Pr

P1

P2

… …

Pr

Send results to master

Send results to master

H

P

1

…H

P

1

…== ???

Send intermediate result to reducer

•Process

… …

Bn

•Ma

•Mb

•Ra

•12/32

•Rb

•Process

Scalability?Scalability? Integrity?Integrity?

Page 13: Tri hug 2010   wei

Computer Science

P1

P2

… …

Pr

A Naive ApproachB

1B

2B

3B

4

•Rea

d

P1

P2

… …

Pr

Send results to master

Send results to master

H

P

1

H

P

2

…H

P

1

H

P

2

…H==

… …

Bn

•Ma

•Mb

•Ra

•13/32

•Rb

Page 14: Tri hug 2010   wei

Computer Science

P1

P2

… …

Pr

A Naive Approach

•Rea

d

P1

P2

… …

Pr

Send results to master

Send results to master

B1

B2

B3

B4

… …

Bn

•Ma

•Mb

•Ra

Send tampered result to reducer

H

P

1

H

P

2

…H

P

1

H

P

2

…H==

Output 1

•14/32

•Rb

Output 1

==

Page 15: Tri hug 2010   wei

Computer Science

SecureMR

Basic Ideao Adopt a replication-based scheme (integrity)o Decentralize integrity verification (scalability & integrity)

Design Goalso Security

o Non-repudiation, resilience to DoS and replay attackso Performance

o Minimize computation cost and network communicationso Applicability

o Preserve existing protocol as much as possible

•15/32

Page 16: Tri hug 2010   wei

Computer Science

SecureMR – Architecture Design

•MapReduce

Open Systems

Grid Computing, Volunteer Computing and P2P Computing

Network Infrastructure

User Applications

•Reducer

Task Executor

•Master

Scheduler

•Mapper

Task Executor

•16/32

Page 17: Tri hug 2010   wei

Computer Science

SecureMR – Architecture Design

•SecureMR

Open Systems

Grid Computing, Volunteer Computing and P2P Computing

Network Infrastructure

User Applications

•Reducer

Secure

Task Executor

Secure

Verifier

•Master

Secure

Scheduler

Secure

Manager

•Mapper

Secure

Task Executor

Secure

Committer

•17/32

Page 18: Tri hug 2010   wei

Computer Science

SecureMR – Communication Design•…

•Reduce Phase

• B1

• B2

• … …

• Bn

•DFS

• 2. R

ead

•7. Notify

•… …

•Map Phase

•5. Compare

•1.1. Assi

gn

•8. Request

•9. Response

•10. Verify

• 3. P

roce

ss

•Master

•4. Commit

•1.2. Assi

gn

•6. Assign

• Inp

ut

•18/32

•M2

•R1

•M1

•Rr

•Reducer•Mapper

•Mn

•Commitment

•Verification

Page 19: Tri hug 2010   wei

Computer Science

SecureMR – Commitment Protocol

P1

P2

… …

Pr

HP1

HP2

HPr

{Hr}sig

P1

P2

… …

Pr

HP1

HP2

HPr

{Hr}sig

Send hashes

Send hashes

H

P

1

H

P

2

… {H}

s

i

g

H

P

1

… {H}

s

i

g

==

B1

B2

B3

B4

… …

Bn

•Ma

•Mb

•Rea

d

•19/32

Page 20: Tri hug 2010   wei

Computer Science

SecureMR – Verification Protocol

P1

P2

… …

Pr

HP1

HP2

HPr

{Hr}sig

P1

P2

… …

Pr

HP1

HP2

HPr

{Hr}sig

Send hashes

Send hashes

Notify & {HP1}sig

Read & Calculate H’P1

HP1 == H’P1?

… …

… …

Notify &

{HP

r }sig

Read & Calculate H’Pr

HPr == H’Pr?

B1

B2

B3

B4

… …

Bn

•Rea

d •Ma

•Mb

•R1

•Rr

•20/32

Page 21: Tri hug 2010   wei

Computer Science

SecureMR – Verification Protocol

P1

P2

… …

Pr

HP1

HP2

HPr

{Hr}sig

P1

P2

… …

Pr

HP1

HP2

HPr

{Hr}sig

Send hashes

Send hashes

Notify & {HP1}sig

Read & Calculate H’P1

HP1 == H’P1

B1

B2

B3

B4

… …

Bn

•Rea

d •Ma

•Mb

•R1

•21/32

Page 22: Tri hug 2010   wei

Computer Science

MapReduce in Open Systems – Integrity

•… …

Reduce Phase

•DFS•…

•Map Phase

•Master

• Inp

ut

P1... …

Pr

B2

… …

Bn

B1

Local Write•Read from

DFS

P1… …

Pr

P1… …

Pr

Assign

MapTask•Assign ReduceTask

Remote ReadOutput 1

Output r

Write to DFS

•… …

Intermediate Result

•DFS

•22/32

•M2

•R1•M1

•Rr

•Reducer•Mapper

•Mn

Page 23: Tri hug 2010   wei

Computer Science

Outline

IntroductionSystem ModelSystem DesignAnalysis and EvaluationRelated WorkConclusion

•23/32

Page 24: Tri hug 2010   wei

Computer Science

SecureMR – Analysis

Security Analysiso No false alarmo Non-repudiation

Attacker Behavior Analysiso Periodical attackers without collusion (Detection Rate)o Periodical attackers with collusion (Detection Rate)o Strategic attackers (Misbehaving Probability)

Detection Rateo We define the detection rate, denoted Drate, as the probability that the

inconsistency between results caused by the misbehavior is detected during l jobs.

•24/32

Page 25: Tri hug 2010   wei

Computer Science

SecureMR – Analysis

•25/32

Detection Rate for Collusive Periodical Attacker

• # of works n = 50• misbehaving probability pm = 0.5• # of blocks b = 20• # of jobs l = 15

• pb – duplication rate• m – # of malicious workers

Page 26: Tri hug 2010   wei

Computer Science

SecureMR – Evaluation

System Implementationo Implementation based on Hadoopo Two scheduling algorithms for comparisons

Naive task scheduling algorithm Commitment-based task scheduling algorithm

o Non-blocking Consistency verification

Experiment Setupo 14 hosts in Virtual Computing Lab (VCL)o 2.66GHz Intel Intel(R) Core(TM) 2 Duoo Ubuntu Linux 8.04, Sun JDK 6 and Hadoop 0.19o Hadoop WordCount application

•26/32

Page 27: Tri hug 2010   wei

Computer Science

SecureMR – Evaluation

•27/32

• # of map tasks = 60• # of reduce tasks = 25• size of input data = 1GB

Response Timeo We define the response time as the time to finish map and reduce tasks

in a job.

Response Time vs Duplication Rate

Page 28: Tri hug 2010   wei

Computer Science

Outline

IntroductionSystem ModelSystem DesignAnalysis and EvaluationRelated WorkConclusion

•28/32

Page 29: Tri hug 2010   wei

Computer Science

Related Work

Research related to MapReduce Machine Learning [Cheng, et al., NIPS 2006] Data Intensive Computing [Ekanayake, et al., eScience 2008] Semantic Annotation [Laclav´ık, et al., ICCS 2008]o Few attention paied to the integrity protection in MapReduce

Related techniques Sampling for uncheatable grid computing [Du, et al., ICDCS 2004] Quiz for result verification [Zhao, et al., P2P 2005] Majority voting and sport-checking [Sarmenta, et al., FGCS 2002]o None of them addressed unique challenges like massive data

processing and multi-party distributed computation Research on system security

Securing publish-subscribe services [Srivatsa, et al., CCS 2005] Peerreview in distributed systems [Haeberlen, et al., SOSP 2007]o SecureMR focuses on a different domain

•29/32

Page 30: Tri hug 2010   wei

Computer Science

Outline

IntroductionSystem ModelSystem DesignAnalysis and EvaluationRelated WorkConclusion

•30/32

Page 31: Tri hug 2010   wei

Computer Science

Conclusion

To the best of our knowledge, our work makes the first attempt to address this problem.

Contributionso A decentralized replication-based integrity verification schemeo A prototype of SecureMRo Analytical study and experimental evaluation of performance overhead

Future Worko Explore other techniques to address collusion attacko Provide data quality assurance for final result

•31/32

Page 32: Tri hug 2010   wei

Computer Science

•Thank you•Questions?

•32/32