14
1 EECS Seminar on Optimization and Computing, NTNU, 30/09/2014 Getting Ready for Approximate Computing: Trading Parallelism for Accuracy for DSS Workloads Pedro Moura Trancoso CASPER Research Group Department of Computer Science University of Cyprus 1 In: Computing Frontiers 2014, Cagliari, Italy, 20-May-2014 EECS Seminar on Optimization and Computing, NTNU, 30/09/2014 Motivation 2 2 + 2 = 2 + 2 = 4.2 3.9 2 + 2 = 4

2 = 4 2 + 2 = 4.2 2 + 2 = 3 - NTNU · – Evaluation: 10 representative queries: Q1, Q2, Q4, Q5, Q6, Q7, Q8, Q9, q12, and Q13 • Input Data Set: SF=0.1 or 100MB • Parts of Tool

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 2 = 4 2 + 2 = 4.2 2 + 2 = 3 - NTNU · – Evaluation: 10 representative queries: Q1, Q2, Q4, Q5, Q6, Q7, Q8, Q9, q12, and Q13 • Input Data Set: SF=0.1 or 100MB • Parts of Tool

1

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

Getting Ready for Approximate Computing: Trading Parallelism for Accuracy for DSS Workloads

Pedro Moura Trancoso CASPER Research Group

Department of Computer Science University of Cyprus

1 In: Computing Frontiers 2014, Cagliari, Italy, 20-May-2014

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

Motivation

2

2 + 2 =

2 + 2 = 4.2

3.9 2 + 2 =

4

Page 2: 2 = 4 2 + 2 = 4.2 2 + 2 = 3 - NTNU · – Evaluation: 10 representative queries: Q1, Q2, Q4, Q5, Q6, Q7, Q8, Q9, q12, and Q13 • Input Data Set: SF=0.1 or 100MB • Parts of Tool

2

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

Motivation (for the new reality!)

•  Processor Evolution –  More Cores –  Increasing density

•  Problem –  Power Consumption

•  “Trivial” Solutions –  (A) Disable part of

components1

–  (B) Execute at lower voltage2

•  Approximate Computing: –  Design relaxed accuracy

components –  Expose Errors to

Application-level –  Fits well for some apps

(e.g. image processing)

3

1 Dark Silicon 2 Sub-threshold Computing

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

But…

•  Different conditions results in different error rates

•  Different applications have different sensitivity to errors

•  Design of Adaptive Tools to control the 2-way variability

4

• We propose: –  Trade Parallelism for Accuracy –  Dynamically determine number of replicas

needed to improve results –  Result combining tool for DSS workload

0%

20%

40%

60%

80%

100%

QoR

Error Rate

Page 3: 2 = 4 2 + 2 = 4.2 2 + 2 = 3 - NTNU · – Evaluation: 10 representative queries: Q1, Q2, Q4, Q5, Q6, Q7, Q8, Q9, q12, and Q13 • Input Data Set: SF=0.1 or 100MB • Parts of Tool

3

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

Increasing Accuracy with Parallelism

5

Where was CF hosted until 2011? (A) Cagliari, Italy (B) Bertinoro, Italy (C) Ischia, Italy (D) Paphos, Cyprus X

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

Increasing Accuracy with Parallelism

6

Where was CF hosted until 2011? (A) Cagliari, Italy (B) Bertinoro, Italy (C) Ischia, Italy (D) Paphos, Cyprus

B

A, B, D

C, D

B, C

C

C

Bertinoro & Ischia, Italy

A B C D

B D C

B C

C

Page 4: 2 = 4 2 + 2 = 4.2 2 + 2 = 3 - NTNU · – Evaluation: 10 representative queries: Q1, Q2, Q4, Q5, Q6, Q7, Q8, Q9, q12, and Q13 • Input Data Set: SF=0.1 or 100MB • Parts of Tool

4

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

Correct Answer? Ask Philosophers, Sociologists…

7

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

Contributions

•  Proposal of QoR a metric that quantifies the quality of the results of a DSS query

• Analysis of query result sensitivity to relaxed correctness

•  Proposal of a dynamic adaptive technique that is implemented as a tool

•  Evaluation of the proposed technique using a real DSS workload on a real DBMS for different fault rate scenarios

8

Page 5: 2 = 4 2 + 2 = 4.2 2 + 2 = 3 - NTNU · – Evaluation: 10 representative queries: Q1, Q2, Q4, Q5, Q6, Q7, Q8, Q9, q12, and Q13 • Input Data Set: SF=0.1 or 100MB • Parts of Tool

5

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

Related Work •  Relaxed Correctness

–  Stochastic, Approximate, Inexact Computing [CF’12] –  Metric of Correctness –  Applications detect and correct/tolerate errors

•  Disciplined Approximate Programming Model [ASPLOS XVII]

•  Fault-tolerance: –  techniques to overcome transient faults: replication,

redundancy •  Algorithmic changes to applications •  (Mixed Precision) •  Autonomic Computing •  Our Work: Combination of the above for Database

Workloads – Adaptive dynamic combination for different hardware fault rates and query sensitivities

9

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

Modelling Approximation

• Disciplined Approximate Computing Model •  Localized Fault injection in PostgreSQL

–  ExecQual function –  Flipping result: generates False Positives and

False Negatives –  Normal distribution: based on random function –  Error Rate = Frequency that a tuple is evaluated

incorrectly –  Controlled error rate: threshold –  Emulates also storage errors

10

Page 6: 2 = 4 2 + 2 = 4.2 2 + 2 = 3 - NTNU · – Evaluation: 10 representative queries: Q1, Q2, Q4, Q5, Q6, Q7, Q8, Q9, q12, and Q13 • Input Data Set: SF=0.1 or 100MB • Parts of Tool

6

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

Quality of Result Metric (QoR)

11

•  A query result may: –  Include tuples that do not belong (False Positive) –  Exclude tuples that should belong (False Negative) –  Include tuples with wrong values in fields

•  AccuracyTP: how many of the true tuples are present (#ANS is number of tuples in correct answer)

•  AccuracyFP: how the result is affected by false results (#RES is #TP+#FP)

•  AccuracyOwn: how the values in fields are affected (ERR is distance of value from correct)

Answer: <Pedro, UCY, 2> <Andreas, UCY, 1> Result: <Pedro, UCY, 2> QoR=AccTP x AccFP x AccOwn =1/2 x 1/1 x 1 = 50%

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

Workload Sensitivity

12

0%#

20%#

40%#

60%#

80%#

100%#

Q1# Q2# Q3# Q4# Q5# Q6# Q7# Q8# Q9# Q10# Q11# Q12# Q13# Q14# Q15# Q16# Q18# Q19# Q21# Q22#

QoR$

0.00001$

TPC-H 100MB PostgreSQL Error Rate = 1 in 100.000

Page 7: 2 = 4 2 + 2 = 4.2 2 + 2 = 3 - NTNU · – Evaluation: 10 representative queries: Q1, Q2, Q4, Q5, Q6, Q7, Q8, Q9, q12, and Q13 • Input Data Set: SF=0.1 or 100MB • Parts of Tool

7

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

Workload Sensitivity

13

0%#

20%#

40%#

60%#

80%#

100%#

Q1# Q2# Q3# Q4# Q5# Q6# Q7# Q8# Q9# Q10# Q11# Q12# Q13# Q14# Q15# Q16# Q18# Q19# Q21# Q22#

QoR$

0.0001$

TPC-H 100MB PostgreSQL Error Rate = 1 in 10.000

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

Workload Sensitivity

14

0%#

20%#

40%#

60%#

80%#

100%#

Q1# Q2# Q3# Q4# Q5# Q6# Q7# Q8# Q9# Q10# Q11# Q12# Q13# Q14# Q15# Q16# Q18# Q19# Q21# Q22#

QoR$

0.001$

TPC-H 100MB PostgreSQL Error Rate = 1 in 1.000

Page 8: 2 = 4 2 + 2 = 4.2 2 + 2 = 3 - NTNU · – Evaluation: 10 representative queries: Q1, Q2, Q4, Q5, Q6, Q7, Q8, Q9, q12, and Q13 • Input Data Set: SF=0.1 or 100MB • Parts of Tool

8

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

Workload Sensitivity

15

0%#

20%#

40%#

60%#

80%#

100%#

Q1# Q2# Q3# Q4# Q5# Q6# Q7# Q8# Q9# Q10# Q11# Q12# Q13# Q14# Q15# Q16# Q18# Q19# Q21# Q22#

QoR$

0.01$

TPC-H 100MB PostgreSQL Error Rate = 1 in 100

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

Workload Sensitivity

•  Intolerant: Q21 •  Highly Sensitive: Q2, Q18 •  Insensitive: Q1, Q14

16

0%#

20%#

40%#

60%#

80%#

100%#

Q1# Q2# Q3# Q4# Q5# Q6# Q7# Q8# Q9# Q10# Q11# Q12# Q13# Q14# Q15# Q16# Q18# Q19# Q21# Q22#

QoR$

0.1$

TPC-H 100MB PostgreSQL Error Rate = 1 in 10

Page 9: 2 = 4 2 + 2 = 4.2 2 + 2 = 3 - NTNU · – Evaluation: 10 representative queries: Q1, Q2, Q4, Q5, Q6, Q7, Q8, Q9, q12, and Q13 • Input Data Set: SF=0.1 or 100MB • Parts of Tool

9

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

Workload Sensitivity: Breakdown

•  Q21 (Intolerant): ErrorOwn (errors in aggregate values) •  Q2 (Highly Sensitive): ErrorTP (reduced correct tuples) •  Q18 (Highly Sensitive): ErrorFP (many extra tuples) •  Q1, Q14 (Insensitive): Simple queries without nested operations

Nested Queries è Complex è More Operations è Sensitive

17

00%#

10%#

20%#

30%#

40%#

50%#

60%#

70%#

80%#

90%#

100%#

Q1# Q2# Q3# Q4# Q5# Q6# Q7# Q8# Q9# Q10#Q11#Q12#Q13#Q14#Q15#Q16#Q18#Q19#Q21#Q22#

1"QoR

&

ErrorOwn#

ErrorFP#

ErrorTP#

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

Combining Algorithm

•  Trade parallel resources for accuracy: combine inaccurate results

•  Remove Aggregate, Filter out results with smaller frequency (not as probable to belong to result), Apply Aggregate

18

CombineResults( query ) { query' = cleanAggregate( query ) results = execute n instances of query' uniqueTuples = frequency( results ) combineResult = combine( uniqueTuples, x ) newResult = aggregate( combineResult ) return newResult }

Page 10: 2 = 4 2 + 2 = 4.2 2 + 2 = 3 - NTNU · – Evaluation: 10 representative queries: Q1, Q2, Q4, Q5, Q6, Q7, Q8, Q9, q12, and Q13 • Input Data Set: SF=0.1 or 100MB • Parts of Tool

10

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

Combining Implementations

•  n replicas, threshold x for combining results •  Static Combination of Results

–  x = ceil( n/2 ); n=3, 5, 7 –  2-of-3, 3-of-5, 4-of-7

•  Adaptive Dynamic Combination of Results –  Real life: Not possible to know query sensitivity and

failure rate ahead of execution –  Best (n, x) –  Execute 5 replicas and obtain 10 first results,

• apply 3-of-5 • Observe how far #unique tuples is from 10 • Use heuristic to get the x-of-n

19

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

Adaptive Combine Tool Execution: 1.  Remove the aggregate operation

from the query. 2.  Send 5 replicas of the non-

aggregate query to the system and execute them to obtain the first 10 tuples.

3.  Collect the 5 results, combine them, and decide on the number of replicas n to be used to complete the execution (1, 3, 5, or 7)

4.  Continue the execution of the query in the n replicas

5.  Collect the results from the n replicas, combine the results, and use the corresponding voting scheme to filter the results

6.  Apply the aggregate function to the combined filtered result

20

Query A

DBMS DBMS

Query A

Adapt Combine

Query A’1 Query A’n

DBMS

. . .

. . .

Page 11: 2 = 4 2 + 2 = 4.2 2 + 2 = 3 - NTNU · – Evaluation: 10 representative queries: Q1, Q2, Q4, Q5, Q6, Q7, Q8, Q9, q12, and Q13 • Input Data Set: SF=0.1 or 100MB • Parts of Tool

11

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

Experimental Setup

•  Real DBMS: PostgreSQL v9.2.4 •  Fault-rates: 1 in 100000 (0.00001), 1 in 10000

(0.0001), 1 in 1000 (0.001), 1 in 100 (0.01) and 1 in 10 (0.1)

• Workload: TPC-H 22 queries –  Characterization: 20 queries –  Evaluation: 10 representative queries: Q1, Q2, Q4,

Q5, Q6, Q7, Q8, Q9, q12, and Q13

•  Input Data Set: SF=0.1 or 100MB •  Parts of Tool implemented as ‘C’ code •  System: Intel Core 2 Duo 1.4GHz, OS X v 10.9.1

21

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

QoR with Static Combining

•  Original (no replica), 2-of-3, 3-of-5, 4-of-7, OracleBest (x-of-n) •  Combining results in increase of QoR (QoR > 80%) •  OracleBest not needed unless sensitive Qs and high failure rate

22

00%#

10%#

20%#

30%#

40%#

50%#

60%#

70%#

80%#

90%#

100%#

0,00001#

0,0001#

0,001#

0,01#

0,1#

0,00001#

0,0001#

0,001#

0,01#

0,1#

0,00001#

0,0001#

0,001#

0,01#

0,1#

0,00001#

0,0001#

0,001#

0,01#

0,1#

0,00001#

0,0001#

0,001#

0,01#

0,1#

0,00001#

0,0001#

0,001#

0,01#

0,1#

0,00001#

0,0001#

0,001#

0,01#

0,1#

0,00001#

0,0001#

0,001#

0,01#

0,1#

0,00001#

0,0001#

0,001#

0,01#

0,1#

0,00001#

0,0001#

0,001#

0,01#

0,1#

Q1# Q2# Q4# Q5# Q6# Q7# Q8# Q9# Q12# Q13#

OracleBest#

48of87#

38of85#

28of83#

Original#

Page 12: 2 = 4 2 + 2 = 4.2 2 + 2 = 3 - NTNU · – Evaluation: 10 representative queries: Q1, Q2, Q4, Q5, Q6, Q7, Q8, Q9, q12, and Q13 • Input Data Set: SF=0.1 or 100MB • Parts of Tool

12

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

QoR with Adaptive Combining

•  In most cases: –  Adaptive “kicks in” when needed –  Adaptive achieves best of static results

23

00%#

10%#

20%#

30%#

40%#

50%#

60%#

70%#

80%#

90%#

100%#

0,00001#

0,0001#

0,001#

0,01#

0,1#

0,00001#

0,0001#

0,001#

0,01#

0,1#

0,00001#

0,0001#

0,001#

0,01#

0,1#

0,00001#

0,0001#

0,001#

0,01#

0,1#

0,00001#

0,0001#

0,001#

0,01#

0,1#

0,00001#

0,0001#

0,001#

0,01#

0,1#

0,00001#

0,0001#

0,001#

0,01#

0,1#

0,00001#

0,0001#

0,001#

0,01#

0,1#

0,00001#

0,0001#

0,001#

0,01#

0,1#

0,00001#

0,0001#

0,001#

0,01#

0,1#

Q1# Q2# Q4# Q5# Q6# Q7# Q8# Q9# Q12# Q13#

Adap3ve#

Original#

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

Evaluation of Adaptive Technique

•  Evaluation: –  78% same decision as Oracle –  14% used more replicas (over

achieved QoR) –  8% needed to use more replicas

(under achieved QoR)

•  Query “Satisfaction” –  80%+ of queries with QoR > 80%

up to 1 in 100 error rate

24

78%$

14%$

8%$

Correct$

Over$

Under$

0%#

20%#

40%#

60%#

80%#

100%#

0.00001# 0.0001# 0.001# 0.01# 0.1#

Original#

Adap4ve#

•  Goal: Tradeoff between QoR and Parallelism •  Assumption: QoR > 80%

Page 13: 2 = 4 2 + 2 = 4.2 2 + 2 = 3 - NTNU · – Evaluation: 10 representative queries: Q1, Q2, Q4, Q5, Q6, Q7, Q8, Q9, q12, and Q13 • Input Data Set: SF=0.1 or 100MB • Parts of Tool

13

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

QoR and Parallelism

•  Adaptive: –  Trades Par for QoR

compared to Original –  Same QoR as Static but

better Par –  Slightly lower QoR than

Oracle but effective dynamic scheme

25

0%#

20%#

40%#

60%#

80%#

100%#

Orig

inal#

20of03#

30of05#

40of07#

Oracle80%

#

OracleB

est#

Adap>ve#

QoR#

Par#

Degree of Parallelism (Par): -  1 replica = 100% -  3 replicas = 33.3% Average 10 Qs and 5 fault scenarios

•  Adaptive –  increased avg QoR from 68.9% to 79.3% with a parallelism degree of

55.6% –  achieved QoR within 7% of Oracle –  gained 22.3% in parallelism from Static –  correctly selects strategy 78% of the times

EECS Seminar on Optimization and Computing, NTNU, 30/09/2014

Conclusions •  Motivation:

–  Handling errors at Application-level –  Exploiting Approximate Computing for DSS Workloads

•  Contributions: –  Quality-of-Result Metric –  Sensitive Analysis –  Fault-injection on a real DBMS (PostgreSQL) –  Testing using real DSS workload (TPC-H) –  Adaptive Dynamic Scheme to account for query

sensitivity and fault rate & Tool

26

Adaptive Combining increased QoR to almost best (Oracle) with gains in degree of Parallelism compared

to alternative Static schemes and no a-priori knowledge

Page 14: 2 = 4 2 + 2 = 4.2 2 + 2 = 3 - NTNU · – Evaluation: 10 representative queries: Q1, Q2, Q4, Q5, Q6, Q7, Q8, Q9, q12, and Q13 • Input Data Set: SF=0.1 or 100MB • Parts of Tool

14

CASPER GROUP Computer Architecture Systems Performance Evaluation Research www.cs.ucy.ac.cy/carch/casper

Thank you !

27