Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
1
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
Getting Ready for Approximate Computing: Trading Parallelism for Accuracy for DSS Workloads
Pedro Moura Trancoso CASPER Research Group
Department of Computer Science University of Cyprus
1 In: Computing Frontiers 2014, Cagliari, Italy, 20-May-2014
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
Motivation
2
2 + 2 =
2 + 2 = 4.2
3.9 2 + 2 =
4
2
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
Motivation (for the new reality!)
• Processor Evolution – More Cores – Increasing density
• Problem – Power Consumption
• “Trivial” Solutions – (A) Disable part of
components1
– (B) Execute at lower voltage2
• Approximate Computing: – Design relaxed accuracy
components – Expose Errors to
Application-level – Fits well for some apps
(e.g. image processing)
3
1 Dark Silicon 2 Sub-threshold Computing
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
But…
• Different conditions results in different error rates
• Different applications have different sensitivity to errors
• Design of Adaptive Tools to control the 2-way variability
4
• We propose: – Trade Parallelism for Accuracy – Dynamically determine number of replicas
needed to improve results – Result combining tool for DSS workload
0%
20%
40%
60%
80%
100%
QoR
Error Rate
3
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
Increasing Accuracy with Parallelism
5
Where was CF hosted until 2011? (A) Cagliari, Italy (B) Bertinoro, Italy (C) Ischia, Italy (D) Paphos, Cyprus X
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
Increasing Accuracy with Parallelism
6
Where was CF hosted until 2011? (A) Cagliari, Italy (B) Bertinoro, Italy (C) Ischia, Italy (D) Paphos, Cyprus
B
A, B, D
C, D
B, C
C
C
Bertinoro & Ischia, Italy
A B C D
B D C
B C
C
4
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
Correct Answer? Ask Philosophers, Sociologists…
7
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
Contributions
• Proposal of QoR a metric that quantifies the quality of the results of a DSS query
• Analysis of query result sensitivity to relaxed correctness
• Proposal of a dynamic adaptive technique that is implemented as a tool
• Evaluation of the proposed technique using a real DSS workload on a real DBMS for different fault rate scenarios
8
5
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
Related Work • Relaxed Correctness
– Stochastic, Approximate, Inexact Computing [CF’12] – Metric of Correctness – Applications detect and correct/tolerate errors
• Disciplined Approximate Programming Model [ASPLOS XVII]
• Fault-tolerance: – techniques to overcome transient faults: replication,
redundancy • Algorithmic changes to applications • (Mixed Precision) • Autonomic Computing • Our Work: Combination of the above for Database
Workloads – Adaptive dynamic combination for different hardware fault rates and query sensitivities
9
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
Modelling Approximation
• Disciplined Approximate Computing Model • Localized Fault injection in PostgreSQL
– ExecQual function – Flipping result: generates False Positives and
False Negatives – Normal distribution: based on random function – Error Rate = Frequency that a tuple is evaluated
incorrectly – Controlled error rate: threshold – Emulates also storage errors
10
6
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
Quality of Result Metric (QoR)
11
• A query result may: – Include tuples that do not belong (False Positive) – Exclude tuples that should belong (False Negative) – Include tuples with wrong values in fields
• AccuracyTP: how many of the true tuples are present (#ANS is number of tuples in correct answer)
• AccuracyFP: how the result is affected by false results (#RES is #TP+#FP)
• AccuracyOwn: how the values in fields are affected (ERR is distance of value from correct)
Answer: <Pedro, UCY, 2> <Andreas, UCY, 1> Result: <Pedro, UCY, 2> QoR=AccTP x AccFP x AccOwn =1/2 x 1/1 x 1 = 50%
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
Workload Sensitivity
12
0%#
20%#
40%#
60%#
80%#
100%#
Q1# Q2# Q3# Q4# Q5# Q6# Q7# Q8# Q9# Q10# Q11# Q12# Q13# Q14# Q15# Q16# Q18# Q19# Q21# Q22#
QoR$
0.00001$
TPC-H 100MB PostgreSQL Error Rate = 1 in 100.000
7
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
Workload Sensitivity
13
0%#
20%#
40%#
60%#
80%#
100%#
Q1# Q2# Q3# Q4# Q5# Q6# Q7# Q8# Q9# Q10# Q11# Q12# Q13# Q14# Q15# Q16# Q18# Q19# Q21# Q22#
QoR$
0.0001$
TPC-H 100MB PostgreSQL Error Rate = 1 in 10.000
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
Workload Sensitivity
14
0%#
20%#
40%#
60%#
80%#
100%#
Q1# Q2# Q3# Q4# Q5# Q6# Q7# Q8# Q9# Q10# Q11# Q12# Q13# Q14# Q15# Q16# Q18# Q19# Q21# Q22#
QoR$
0.001$
TPC-H 100MB PostgreSQL Error Rate = 1 in 1.000
8
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
Workload Sensitivity
15
0%#
20%#
40%#
60%#
80%#
100%#
Q1# Q2# Q3# Q4# Q5# Q6# Q7# Q8# Q9# Q10# Q11# Q12# Q13# Q14# Q15# Q16# Q18# Q19# Q21# Q22#
QoR$
0.01$
TPC-H 100MB PostgreSQL Error Rate = 1 in 100
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
Workload Sensitivity
• Intolerant: Q21 • Highly Sensitive: Q2, Q18 • Insensitive: Q1, Q14
16
0%#
20%#
40%#
60%#
80%#
100%#
Q1# Q2# Q3# Q4# Q5# Q6# Q7# Q8# Q9# Q10# Q11# Q12# Q13# Q14# Q15# Q16# Q18# Q19# Q21# Q22#
QoR$
0.1$
TPC-H 100MB PostgreSQL Error Rate = 1 in 10
9
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
Workload Sensitivity: Breakdown
• Q21 (Intolerant): ErrorOwn (errors in aggregate values) • Q2 (Highly Sensitive): ErrorTP (reduced correct tuples) • Q18 (Highly Sensitive): ErrorFP (many extra tuples) • Q1, Q14 (Insensitive): Simple queries without nested operations
Nested Queries è Complex è More Operations è Sensitive
17
00%#
10%#
20%#
30%#
40%#
50%#
60%#
70%#
80%#
90%#
100%#
Q1# Q2# Q3# Q4# Q5# Q6# Q7# Q8# Q9# Q10#Q11#Q12#Q13#Q14#Q15#Q16#Q18#Q19#Q21#Q22#
1"QoR
&
ErrorOwn#
ErrorFP#
ErrorTP#
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
Combining Algorithm
• Trade parallel resources for accuracy: combine inaccurate results
• Remove Aggregate, Filter out results with smaller frequency (not as probable to belong to result), Apply Aggregate
18
CombineResults( query ) { query' = cleanAggregate( query ) results = execute n instances of query' uniqueTuples = frequency( results ) combineResult = combine( uniqueTuples, x ) newResult = aggregate( combineResult ) return newResult }
10
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
Combining Implementations
• n replicas, threshold x for combining results • Static Combination of Results
– x = ceil( n/2 ); n=3, 5, 7 – 2-of-3, 3-of-5, 4-of-7
• Adaptive Dynamic Combination of Results – Real life: Not possible to know query sensitivity and
failure rate ahead of execution – Best (n, x) – Execute 5 replicas and obtain 10 first results,
• apply 3-of-5 • Observe how far #unique tuples is from 10 • Use heuristic to get the x-of-n
19
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
Adaptive Combine Tool Execution: 1. Remove the aggregate operation
from the query. 2. Send 5 replicas of the non-
aggregate query to the system and execute them to obtain the first 10 tuples.
3. Collect the 5 results, combine them, and decide on the number of replicas n to be used to complete the execution (1, 3, 5, or 7)
4. Continue the execution of the query in the n replicas
5. Collect the results from the n replicas, combine the results, and use the corresponding voting scheme to filter the results
6. Apply the aggregate function to the combined filtered result
20
Query A
DBMS DBMS
Query A
Adapt Combine
Query A’1 Query A’n
DBMS
. . .
. . .
11
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
Experimental Setup
• Real DBMS: PostgreSQL v9.2.4 • Fault-rates: 1 in 100000 (0.00001), 1 in 10000
(0.0001), 1 in 1000 (0.001), 1 in 100 (0.01) and 1 in 10 (0.1)
• Workload: TPC-H 22 queries – Characterization: 20 queries – Evaluation: 10 representative queries: Q1, Q2, Q4,
Q5, Q6, Q7, Q8, Q9, q12, and Q13
• Input Data Set: SF=0.1 or 100MB • Parts of Tool implemented as ‘C’ code • System: Intel Core 2 Duo 1.4GHz, OS X v 10.9.1
21
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
QoR with Static Combining
• Original (no replica), 2-of-3, 3-of-5, 4-of-7, OracleBest (x-of-n) • Combining results in increase of QoR (QoR > 80%) • OracleBest not needed unless sensitive Qs and high failure rate
22
00%#
10%#
20%#
30%#
40%#
50%#
60%#
70%#
80%#
90%#
100%#
0,00001#
0,0001#
0,001#
0,01#
0,1#
0,00001#
0,0001#
0,001#
0,01#
0,1#
0,00001#
0,0001#
0,001#
0,01#
0,1#
0,00001#
0,0001#
0,001#
0,01#
0,1#
0,00001#
0,0001#
0,001#
0,01#
0,1#
0,00001#
0,0001#
0,001#
0,01#
0,1#
0,00001#
0,0001#
0,001#
0,01#
0,1#
0,00001#
0,0001#
0,001#
0,01#
0,1#
0,00001#
0,0001#
0,001#
0,01#
0,1#
0,00001#
0,0001#
0,001#
0,01#
0,1#
Q1# Q2# Q4# Q5# Q6# Q7# Q8# Q9# Q12# Q13#
OracleBest#
48of87#
38of85#
28of83#
Original#
12
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
QoR with Adaptive Combining
• In most cases: – Adaptive “kicks in” when needed – Adaptive achieves best of static results
23
00%#
10%#
20%#
30%#
40%#
50%#
60%#
70%#
80%#
90%#
100%#
0,00001#
0,0001#
0,001#
0,01#
0,1#
0,00001#
0,0001#
0,001#
0,01#
0,1#
0,00001#
0,0001#
0,001#
0,01#
0,1#
0,00001#
0,0001#
0,001#
0,01#
0,1#
0,00001#
0,0001#
0,001#
0,01#
0,1#
0,00001#
0,0001#
0,001#
0,01#
0,1#
0,00001#
0,0001#
0,001#
0,01#
0,1#
0,00001#
0,0001#
0,001#
0,01#
0,1#
0,00001#
0,0001#
0,001#
0,01#
0,1#
0,00001#
0,0001#
0,001#
0,01#
0,1#
Q1# Q2# Q4# Q5# Q6# Q7# Q8# Q9# Q12# Q13#
Adap3ve#
Original#
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
Evaluation of Adaptive Technique
• Evaluation: – 78% same decision as Oracle – 14% used more replicas (over
achieved QoR) – 8% needed to use more replicas
(under achieved QoR)
• Query “Satisfaction” – 80%+ of queries with QoR > 80%
up to 1 in 100 error rate
24
78%$
14%$
8%$
Correct$
Over$
Under$
0%#
20%#
40%#
60%#
80%#
100%#
0.00001# 0.0001# 0.001# 0.01# 0.1#
Original#
Adap4ve#
• Goal: Tradeoff between QoR and Parallelism • Assumption: QoR > 80%
13
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
QoR and Parallelism
• Adaptive: – Trades Par for QoR
compared to Original – Same QoR as Static but
better Par – Slightly lower QoR than
Oracle but effective dynamic scheme
25
0%#
20%#
40%#
60%#
80%#
100%#
Orig
inal#
20of03#
30of05#
40of07#
Oracle80%
#
OracleB
est#
Adap>ve#
QoR#
Par#
Degree of Parallelism (Par): - 1 replica = 100% - 3 replicas = 33.3% Average 10 Qs and 5 fault scenarios
• Adaptive – increased avg QoR from 68.9% to 79.3% with a parallelism degree of
55.6% – achieved QoR within 7% of Oracle – gained 22.3% in parallelism from Static – correctly selects strategy 78% of the times
EECS Seminar on Optimization and Computing, NTNU, 30/09/2014
Conclusions • Motivation:
– Handling errors at Application-level – Exploiting Approximate Computing for DSS Workloads
• Contributions: – Quality-of-Result Metric – Sensitive Analysis – Fault-injection on a real DBMS (PostgreSQL) – Testing using real DSS workload (TPC-H) – Adaptive Dynamic Scheme to account for query
sensitivity and fault rate & Tool
26
Adaptive Combining increased QoR to almost best (Oracle) with gains in degree of Parallelism compared
to alternative Static schemes and no a-priori knowledge
14
CASPER GROUP Computer Architecture Systems Performance Evaluation Research www.cs.ucy.ac.cy/carch/casper
Thank you !
27