30
NOTICE: A Framework for Non- functional Testing of Compilers Mohamed BOUSSAA Olivier BARAIS Gerson SUNYE Benoit BAUDRY 2016 IEEE International Conference on Software Quality, Reliability & Security (QRS 2016) August 1-3, 2016 - Vienna, Austria INRIA Rennes, France 2016 IEEE International Conference on Software Quality, Reliability & Security (QRS 2016) August 1-3, 2016 - Vienna, Austria 1

QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

Embed Size (px)

Citation preview

Page 1: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

NOTICE: A Framework for Non-functional Testing of Compilers

Mohamed BOUSSAA

OlivierBARAIS

GersonSUNYE

BenoitBAUDRY

2016 IEEE International Conference on Software Quality, Reliability & Security (QRS 2016)

August 1-3, 2016 - Vienna, Austria

INRIA Rennes, France

2016 IEEE International Conference on Software Quality, Reliability & Security (QRS 2016)

August 1-3, 2016 - Vienna, Austria1

Page 2: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

a1. Context

a2. Motivating Example

a3. NOTICE: A Framework for Non-functional Testing of Compilers

a4. Performance Evaluation

a5. Conclusion

Outline

2

Page 3: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

Motivation

C Compilers

Source code Machine code

Machine code

Machine code

Optimizations

Current innovations in science and industry demand ever-increasing computingresources while placing strict requirements on system performance, power

consumption, size, response, reliability, portability and design time

Satisfy the non-functional requirements

for a broad range of programs and

architectures

Resource constraints

3

Page 4: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

Compiler fine/auto-tuning is complex

4

Huge design space for optimization options (more than 150 optimizations)

• compiling a program means trading off between various objectives • compilation time, code quality, code size, ...

Constructing a good set of optimization levels (-Ox) is hard

• conflicting objectives, complex interactions, unknown effect of some optimizations, ...

4

Page 5: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

Trying to please everyone

5

Program-independent universal sequences (e.g., -O1, -O2, -O3, etc.)

Based on heuristics and experience

Each optimization level allows trading off various objectives

• O1: "take your time, give it your best shot"• O2: "optimize, and be quick about it"• O3: "I’m feeling lucky, and have lots of time"

How efficient are predefined/universal compiler levels?

5

Page 6: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

Motivating Example

GCC 4.8.4:- 78 optimizations - 278 combinations

6

Speedup,Memory,

etc.

ResourceConstraints

WHY ALWAYS

ME !!????

- Testing each optimization configuration is impossible

-BOSS: Clients complain about the high memory consumption

-BOSS: Is it possible to consume less CPU? we don’t have enough resources/money

-BOSS: Please, can we optimize even more ?

Good luck Son !!

- Heuristics are needed

6

Page 7: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

NOTICE: A Framework for Non-functional

Testing of Compilershttps://noticegcc.wordpress.com

7

Page 8: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

Contributions

1- Diversity-based exploration

Novel formulation of the compiler optimization problem using Novelty Search Diverse optimization sequences Explore the large search space by considering Novelty as the main objective

2- Microservice-based infrastructure

Execute and monitor of the different variants of optimized code using system containers

Resource isolation and management Provide a fine-grained understanding and analysis of compilers behavior regarding

optimizations Automatic extraction of non-functional properties relative to resource usage Finely auto-tuning compilers according to user (non-functional) requirements

We propose:

8

Page 9: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

Diversity-based exploration

gcc –c test.c –fno-dce –fno-dse –fdce -fno-align-loops …

Mutation:

Crossover:

Best solution Solution with best non-functional improvement

0 0 1 0 …

Step 2: Evaluation

… Archive:

Novelty metric:

Step 3: Selection

Step 4: Evolutionary

operators 0 1 1 1 0 …

0 1 1 1 0 …

1 0 0 1 1 …

Go To Step 2

Solution representation:

Saves solutions that get a novelty metric value higher than a specific novelty threshold value.

Calculate the distance of one solution from its K Nearest neighbors in current population and in the Archive.

Step 1: Random

generation

9

Select solutions to evolve based on novelty scores.

Tournament selection:

Page 10: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

Contributions

1- Diversity-based exploration

Novel formulation of the compiler optimization problem using Novelty Search Diverse optimization sequences Explore the large search space by considering Novelty as the main objective

2- Microservice-based infrastructure

Execute and monitor of the different variants of optimized code using lightweight system containers

Provide a fine-grained understanding and analysis of compilers behavior regarding optimizations

Resource isolation and management Automatic extraction of non-functional properties relative to resource usage Finely auto-tuning compilers according to user (non-functional) requirements

We propose:

10

Page 11: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

NOTICE Infrastructure

000

000

NOTICE

Compile and execute optimized code within a new

container instance

Gather at runtime non-functional properties of running programs under test

Save information relative to resource consumptions within a times series database

Analysis of the performance and non-functional properties

of programs under test

1

2

3

4

Code Execution

RuntimeMonitoring

Time seriesDatabase

PerformanceAnalysis

11

Page 12: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

NOTICE Infrastructure

OptimizationsComponentUnder Test

MonitoringComponent

Back-endDatabase

Component

Cgroup file systemsRunning…

Monitoring records

Front-endVisualizationComponent

Time-series database

HTTP Requests

CPU

Memory

12

Page 13: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

Evaluationhttps://noticegcc.wordpress.com/experimental-results/

13

Page 14: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

Experimental Setup

v4.8.4

Random C code generator

For monitoring

For storage

Optimizations

Mono ObjectiveNovelty Search (NS)Genetic Algo (GA)

Random Search (RS)

Multi ObjectiveNovelty Search (NS-II)

NSGA-II

Speedup (S)

Meta-heuristics

Program under

test

Compiler

Algorithm parameters

Evaluation metrics

Memory consumption reduction (MR)

CPU consumption reduction (CR)

Over -O0

Trade-off <execution time - memory usage>

14

Page 15: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

Research QuestionsRQ1: Mono-objective SBSE Validation.

Optimizations

Non-functionalmetric

Training set programs

Best sequence

RQ2: Sensitivity of input programs to optimization sequences

Unseen programs

Non-functionalimprovementBest sequence

in RQ1

RQ3: Impact of speedup on resource consumption.

RQ4: Trade-offs between non-functional properties.

Best SpeedupSequence

In RQ1

Impact on resource

consumption OptimizationsPareto front

solutions

15

Training set programs

Multi-objective search

Mono-objective search

Non-functionalTrade-off

<time-memory>

Input program

Page 16: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

RQ1- ResultsRQ1: Mono-objective SBSE Validation.- Training set: 10 Csmith programs- Average S, MR, and CR- Comparison: Ox, RS, GA and NS

Key findings for RQ1:– Best discovered optimization sequences using mono-objective search techniques always provide better results than standard GCC optimization levels.– Novelty Search is a good candidate to improve code in terms of non-functional properties since it is able to discover optimization combinations that outperform RS and GA.

Search for best optimization sequence

Best sequence

Optimizations

Non-functionalMetric

Training set programs

16

Page 17: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

RQ2- Results

Key findings for RQ2:– It is possible to build general optimization sequences that perform better than standard optimization levels– Best discovered sequences in RQ1 can be mostly used to improve the memory and CPU consumption of Csmith programs. To answer RQ2, Csmith programs are sensitive to compiler optimizations.

RQ2: Sensitivity.- 100 unseen Csmith programs- O2 vs O3 vs NS

Unseen programs

Non-functionalimprovement

Best SequenceIn RQ1

17

Page 18: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

RQ3- ResultsRQ3: Impact of optimizations on resource consumption.- Ox vs RS vs GA vs NS

Key findings for RQ3: – Optimizing software performance can induce undesirable effects on system resources.– A trade-off is needed to find a correlation between software performance and resource usage.

Best SpeedupSequence

In RQ1 Training set programs

Impact on Resource CPU & memory

18

Memory reduction

Increase of resource usage

CPU reduction

Page 19: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

RQ4- ResultsRQ4: Trade-offs between non-functional properties. - 1 Csmith program- Trade-off <execution time-memory usage>

Key findings for RQ4: – NOTICE is able to construct optimization levels that represent optimal trade-offs between non-functional properties.– NS is more effective when it is applied for mono-objective search.– NSGA-II performs better than our NS adaptation for multi-objective optimization. However, NS-II performs clearly better than standard GCC optimizations and previously discovered sequences in RQ1.

19

Optimizations Pareto frontsolutions

Multi-objective searchTrade-off time/memory

Input program

Pareto front NS-II(multi-objective)

Ofast O3 O2

O1

Best CPU reduction (mono-

objective)

Best memory reduction(mono-objective)

Pareto front NSGA-II(multi-objective)

Page 20: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

Conclusion

20

Page 21: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

Conclusion

21

Novel formulation of the compiler optimization problem based on Novelty Search

Novelty Search is able to generate effective optimizations

Automated tool for automatic extraction of non-functional properties of optimized code

Automatically extract information about memory and CPU consumption

Summary

Explore more trade-offs amongresource usage metrics

Evaluate NOTICE:• on real world benchmarks• other case studies (i.e.,

compilers, programs, etc)

Future directions

21

Page 22: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

https://noticegcc.wordpress.com/ 22

Questions?

Page 23: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

Additional slides

23

Page 24: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

Tool Support

24

Page 25: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

Functional Testing of Compilers

PLDI’11

PLDI’14

Literature Overview

ICSE’16

25

Page 26: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

Non-Functional Testing of Compilers

Literature Overview

CGO’08

ACSAC’08

PLDI’04

26

Page 27: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

Prior work is insufficient

Testing the non-functional properities pose several new challenges:

- Different cost-benefit trade-offs (e.g., Speedup/memory or CPU usage)

- Finely auto-tuning compilers according to user (non-functional) requirements

- Performance is the major concern (e.g., speedup)

- Ignore other important non-functional properties (e.g., resource consumption properties)

- Evaluation is based on a small set of input programs (e.g., Spec CPU benchmarks)

27

Page 28: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

Given a set of compiler optimization options {F1, F2, ..., Fn}, How can we find

the combination that maximize program performance better than standard

optimization levels ?

Do this efficiently, without the use of a priori knowledge of the optimizations and their interactions

From

to

From

to

Problem Statement

28

Page 29: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

NSGA-II overview

• NSGA-II: Non-dominated Sorting Genetic Algorithm (K. Deb et al., ’02)

Parent Population

Offspring Population

Non-dominated sorting

F1

F2

F3

F4Crowding distance sorting

Population in next generation

MOEA Framework http://moeaframework.org/

29

Page 30: QRS16: NOTICE: A Framework for Non-functional Testing of Compilers

NSGA-II overview

30