22
Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

Embed Size (px)

Citation preview

Page 1: Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

Yongzhi Wang, Jinpeng Wei

VIAF: Verification-based Integrity Assurance Framework for MapReduce

Page 2: Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

MapReduce in BriefSatisfying the demand for large scale data

processingIt is a parallel programming model invented

by Google in 2004 and become popular in recent years.

Data is stored in DFS(Distributed File System)

Computation entities consists of one Master and many Workers (Mapper and Reducer)

Computation can be divided into two phases: Map and Reduce

2

Page 3: Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

Word Count instance

Hello, 2

Hello, 4

3

Page 4: Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

Integrity VulnerabilityHow to detect and eliminate malicious mappers to guarantee high computation integrity ? Straightforward approach: Duplication

4

Non-collusive worker

Page 5: Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

Integrity vulnerabilityBut how to deal with collusive malicious workers?

5

Page 6: Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

VIAF SolutionDuplicationIntroduce trusted entity to do random

verification

6

Page 7: Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

Outline

MotivationSystem DesignAnalysis and EvaluationRelated workConclusion

7

Page 8: Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

Architecture

8

Page 9: Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

AssumptionTrusted entities

MasterDFSReducersVerifiers

Untrusted entitiesMappers

Information will not be tampered in network communication

Mappers’ result reported to the master should be consistent with its local storage (Commitment based protocol in SecureMR)9

Page 10: Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

SolutionIdea: Duplication + Verification

Deterministically duplicate each task to TWO mappers to discern the non-collusive mappers

Non-deterministically verify the consistent result to discern the collusive mappers

The credit of each mapper is accumulated by passing verification

A Mapper become trustable only when its credit achieves Quiz Threshold

100% accuracy in detecting non-collusive malicious mapper

The more verification applied on a mapper, the higher accuracy to determine whether it is collusive malicious.

10

Page 11: Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

Data flow control

1

1 5.b

5.a

3

2

6

2

3

4

4

11

Page 12: Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

Outline

MotivationSystem DesignAnalysis and EvaluationRelated workConclusion

12

Page 13: Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

Theoretical AnalysisMeasurement metric (for each task)

Accuracy -- The probability the reducer receive a good result from mapper

Mapping Overhead – The average number of execution launched by the mapper

Verification Overhead – The average number of execution launched by the verifier

13

Page 14: Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

Accuracy vs Quiz Threshold

Collusive mappers only, different p:m – Malicious worker ratioc – Collusive worker ratiop – probability that two assigned collusive workers are in one collusion group and can commit a cheatq – probability that two collusive workers commit a cheatr – probability that a non collusive worker commit a cheatv – Verification Probabilityk – Quiz Threshold.

14

0 5 10 15 20

0.7

00

.75

0.8

00

.85

0.9

00

.95

1.0

0

QUIZ THRESHOLD k

AC

CU

RA

CY

Accuracy vs Quiz Threshold m=0.4, c=1.0, q=1.0, r=0.0

p=0.3p=0.5p=1.0

Page 15: Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

Overhead vs Quiz Threshold

Collusive worker only:Overhead for each task stays between 2.0 to 2.4

Collusive worker only:Verification Overhead for each task stays between 0.2 to 0.207 when v is 0.215

0 5 10 15 200

.19

00

.19

50

.20

00

.20

50

.21

0

QUIZ THRESHOLD k

VE

RIF

ICA

TIO

N O

VE

RH

EA

D

Verification Overhead vs Quiz Threshold m=0.4, c=1.0, q=1.0, r=0.0, v=0.2

p=0.3p=0.5p=1.0

0 5 10 15 20

1.5

2.0

2.5

3.0

QUIZ THRESHOLD k

OV

ER

HE

AD

Mapper Overhead vs Quiz Threshold m=0.4, c=1.0, q=1.0, r=0.0

p=0.3p=0.5p=1.0

Page 16: Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

Experimental EvaluationImplementation based on Hadoop 0.21.011 virtual machines(512 MB of Ram, 40 GB

disk each, Debian 5.0.6 ) deployed on a 2.93 GHz, 8-core Intel Xeon CPU with 16 GB of RAM

word count application, 400 mapping tasks and 1 reduce task

Out of 11 virtual hosts1 is both the Master and the benign worker1 is the verifier4 are collusive workers (malicious ratio is

40%)5 are benign workers

16

Page 17: Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

Evaluation resultQuiz

Threshold Accuracy Mapper

OverheadVerification Overhead

0 87.20% 2.000 0

1 99.42% 2.045 22.00%

2 99.83% 2.074 23.58%

3 100% 2.053 23.00%

4 100% 2.162 23.58%

5 100% 2.046 21.75%

6 100% 2.111 22.58%

7 100% 2.027 19.83%

Where c is 1.0, p is 1.0, q is 1.0, m is 0.40

170 5 10 15 20

0.7

00

.75

0.8

00

.85

0.9

00

.95

1.0

0

QUIZ THRESHOLD k

AC

CU

RA

CY

Accuracy vs Quiz Threshold m=0.4, c=1.0, q=1.0, r=0.0

p=0.3p=0.5p=1.0

Page 18: Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

Outline

MotivationSystem DesignAnalysis and EvaluationRelated workConclusion

18

Page 19: Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

Related workReplication, sampling, checkpoint-based solutions

were proposed in several distributed computing contexts Duplicate based for Cloud: SecureMR(Wei et al, ACSAC

‘09), Fault Tolerance Middleware for Cloud Computing( Wenbing et al, IEEE CLOUD ‘10)

Quiz based for P2P: Result verification and trust based scheduling in peer-to-peer grids (S. Zhao et al, P2P ‘05)

Sampling based for Grid: Uncheatable Grid Computing (Wenliang et al, ICDCS’04)

Accountability in Cloud computing researchHardware-based attestation: Seeding Clouds with Trust

Anchors (Joshua et al, CCSW ‘10)Logging and Auditing: Accountability as a Service for the

Cloud(Jinhui, et al, ICSC ‘10)19

Page 20: Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

Outline

MotivationSystem DesignAnalysis and EvaluationRelated workConclusion

20

Page 21: Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

ConclusionContributions

Proposed a Framework (VIAF) to defeat both collusive and non-collusive malicious mappers in MapReduce calculation.

Implemented the system and proved from theoretical analysis and experimental evaluation that VIAF can guarantee high accuracy while incurring acceptable overhead.

Future workFurther exploration without the assumption

of trust of reducer.Reuse the cached result to better utilize the

verifier resource.21

Page 22: Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce

THANKS!

Q&A