Upload
abdel-salam-sayyad
View
225
Download
0
Embed Size (px)
DESCRIPTION
Abdel Salam Sayyad doctoral defense
Citation preview
A Doctoral Defense
Abdel Salam Sayyad
April 23, 2014
Evolutionary Search Techniques
with Strong Heuristics
for Multi-Objective Feature Selection
in Software Product Lines
Sound Bites
• In modern day software requirements / planning / design /
deployment:
– Software Correctness is only one part of the user preferences.
– Single-objective optimization narrows the view.
– So does the aggregation of multiple objectives.
• Widely-used Boolean Domination truncates the data.
– Continuous Domination exploits the richness in user preferences.
• Let your model guide your search:
– Use strong heuristics derived from the domain model.
– Tune the search parameters to “tread lightly.”
• Scalability to large, real-life models is the ultimate test.
– Innovation in method is key to scalability.
April 23, 2014 2Abdel Salam Sayyad - Doctoral Defense
Outline
• Modeling the software product line – “feature models”
• Automated analysis of feature models
• Problem formulation – “multi-objective feature selection”
• Multi-Objective Evolutionary Algorithms (MOEAs)
– Boolean domination vs. continuous domination
– Survey of Pareto-Optimal Search-Based Software Engineering
• Results
– Increasing the number of objectives
– Parameter tuning
– The “PUSH” and “PULL” heuristics
• Scalability of results
– The “population seeding” heuristic
• Future Work
April 23, 2014 3Abdel Salam Sayyad - Doctoral Defense
Outline
• Modeling the software product line – “feature models”
• Automated analysis of feature models
• Problem formulation – “multi-objective feature selection”
• Multi-Objective Evolutionary Algorithms (MOEAs)
– Boolean domination vs. continuous domination
– Survey of Pareto-Optimal Search-Based Software Engineering
• Results
– Increasing the number of objectives
– Parameter tuning
– The “PUSH” and “PULL” heuristics
• Scalability of results
– The “population seeding” heuristic
• Future Work
April 23, 2014 4Abdel Salam Sayyad - Doctoral Defense
Feature Models
April 23, 2014 5Abdel Salam Sayyad - Doctoral Defense
• De facto standard for
modeling variability in
Software Product Lines
Cross-Tree Constraints
Cross-Tree Constraints
Size: 10 Features, 8 Rules
Cardinal: 16 Valid Configurations
Complexity of Feature Models
April 23, 2014 6Abdel Salam Sayyad - Doctoral Defense
Size: 290 Features, 426 Rules
cardinal = 2.26x1049 [Pohl „11]
Complexity of Feature Models
April 23, 2014 7Abdel Salam Sayyad - Doctoral Defense
Size ? 6888 Features; 344,000 Rules
SPLOT model vs. LVAT (Linux) models
April 23, 2014 8Abdel Salam Sayyad - Doctoral Defense
• T. Berger, S. She, R. Lotufo, A. Wasowski, and K. Czarnecki, "A Study of Variability Models and Languages in the
Systems Software Domain," IEEE Tran Soft Eng. 2013
Property SPLOT Linux
Size Significantly smaller Significantly larger
Constraints Significantly less Significantly more
Feature Groups High ratio Low ratio
Leaves Deeper Shallower
Many researchers used randomly-generated feature models from
SPLOT to prove scalability of results. But those models suffer from
the same assumptions as research-generated SPLOT models.
Outline
• Modeling the software product line – “feature models”
• Automated analysis of feature models
• Problem formulation – “multi-objective feature selection”
• Multi-Objective Evolutionary Algorithms (MOEAs)
– Boolean domination vs. continuous domination
– Survey of Pareto-Optimal Search-Based Software Engineering
• Results
– Increasing the number of objectives
– Parameter tuning
– The “PUSH” and “PULL” heuristics
• Scalability of results
– The “population seeding” heuristic
• Future Work
April 23, 2014 9Abdel Salam Sayyad - Doctoral Defense
Automated Analysis of Feature Models
• Most previous work focused on the correctness aspect:
– Model checking
– Fixing model inconsistencies
– Deriving valid configurations (products)
– Enumerating all valid configurations (products)
• Most previous work used exact algorithms:
– Binary Decision Diagram (BDD) solvers
– Constraint Satisfaction Problem (CSP) solvers
– Satisfiability (SAT) solvers
• Some explored efficient product line testing scenarios.
• Some explored product configuration with multiple
objectives, but aggregated the objective into one formula.
April 23, 2014 10Abdel Salam Sayyad - Doctoral Defense
Automated Analysis of Feature Models
April 23, 2014 11Abdel Salam Sayyad - Doctoral Defense
Features
9
290
544
6888
SP
LO
TL
inu
x (
LV
AT
)
Pohl „11 Lopez-
Herrejon
„11
Henard
„12
Johansen
„11
Benavides
„05
White „07, „08, 09a, 09b,
Shi „10, Guo „11
Objectives
Multi-objectiveSingle-objective
Example previous work
• R. Pohl, K. Lauenroth, and K. Pohl, "A Performance Comparison of Contemporary Algorithmic Approaches for
Automated Analysis Operations on Feature Models," in Proc. ASE, Lawrence, KS, USA, 2011, pp. 313-322
April 23, 2014 12Abdel Salam Sayyad - Doctoral Defense
Restricted to Cardinal < 3*106
( < 80 features )
Outline
• Modeling the software product line – “feature models”
• Automated analysis of feature models
• Problem formulation – “multi-objective feature selection”
• Multi-Objective Evolutionary Algorithms (MOEAs)
– Boolean domination vs. continuous domination
– Survey of Pareto-Optimal Search-Based Software Engineering
• Results
– Increasing the number of objectives
– Parameter tuning
– The “PUSH” and “PULL” heuristics
• Scalability of results
– The “population seeding” heuristic
• Future Work
April 23, 2014 13Abdel Salam Sayyad - Doctoral Defense
Multi-Objective Optimization
April 23, 2014 14Abdel Salam Sayyad - Doctoral Defense
dominated
Non-dominated
(Pareto Front)
Combines N objectives
into one with some
weighting scheme
Defining the Optimization Objectives
April 23, 2014 15Abdel Salam Sayyad - Doctoral Defense
Suppose each feature had the following metrics:
1. Boolean USED_BEFORE?
2. Integer DEFECTS
3. Real COST
Find the space of “best options” according to the objectives:
1. That satisfies most domain constraints (0 ≤ #violations ≤ 100%)
2. That offers most features
3. That we have used most before
4. Using features with least known defects
5. Using features with least cost
Feature Selection Problem
Choose a subset of features within given time (search
budget) T, such that:
1. All model constraints are satisfied.
2. Solutions are not dominated by any other solutions
found up to time T.
April 23, 2014 16Abdel Salam Sayyad - Doctoral Defense
Outline
• Modeling the software product line – “feature models”
• Automated analysis of feature models
• Problem formulation – “multi-objective feature selection”
• Multi-Objective Evolutionary Algorithms (MOEAs)
– Boolean domination vs. continuous domination
– Survey of Pareto-Optimal Search-Based Software Engineering
• Results
– Increasing the number of objectives
– Parameter tuning
– The “PUSH” and “PULL” heuristics
• Scalability of results
– The “population seeding” heuristic
• Future Work
April 23, 2014 Abdel Salam Sayyad - Doctoral Defense 17
Genetic Algorithms
April 23, 2014 18Abdel Salam Sayyad - Doctoral Defense
… and the “optimum” solution is:
The fittest individual in the
final generation.
Multi-Objective Optimization
April 23, 2014 19Abdel Salam Sayyad - Doctoral Defense
Higher-level
Decision MakingThe Pareto Front
The Chosen Solution
Fitness Ranking (NSGA-II)
• K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, "A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II,"
IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, pp. 182-197, 2002
April 23, 2014 20Abdel Salam Sayyad - Doctoral Defense
Boolean Dominance
Only the “unbeaten” survive
Fitness Ranking (IBEA)
April 23, 2014 21Abdel Salam Sayyad - Doctoral Defense
• Repeat till Pt and Qt are down to the size of Pt:
– Compare every individual‟s dominance with respect to
everyone else
– Sort all instances by F
– Delete worst, recalculate, delete worst, recalculate, …
Continuous Dominance
Individual gets credit for “amount of
dominance” according to all objectives
• E. Zitzler and S. Kunzli, "Indicator-Based Selection in Multiobjective Search," in Parallel Problem Solving from
Nature. Berlin, Germany: Springer-Verlag, 2004, pp. 832–842
A “variety” of MOEAs?
April 23, 2014 22Abdel Salam Sayyad - Doctoral Defense
Other MOEAs: Ranking criteria:
3) SPEA2 [Zitzler „01] More focus on diversity, with
a new diversity measure.
4) FastPGA [Eskandari „07]Borrowed ranking criteria
from NSGA-II, and diversity
measure from SPEA2
5) MOCell (Cellular GA) [Nebro „09] NSGA-II
Boolean Dominance
Only the “unbeaten” survive
Survey of Pareto-Optimal Search-Based Software Engineering
April 23, 2014 23Abdel Salam Sayyad - Doctoral Defense
51 papers. Only those with Pareto optimization.
• A.S. Sayyad and H. Ammar, "Pareto-Optimal Search-Based Software Engineering: A Literature Survey," in Proc.
RAISE, San Francisco, USA, 2013
Most papers
explored small
objective spaces
(2 or 3 objectives)
Survey of Pareto-Optimal Search-Based Software Engineering
April 23, 2014 24Abdel Salam Sayyad - Doctoral Defense
51 papers. Only those with Pareto optimization.
• A.S. Sayyad and H. Ammar, "Pareto-Optimal Search-Based Software Engineering: A Literature Survey," in Proc.
RAISE, San Francisco, USA, 2013
Most papers that
deployed a single
algorithm chose
NSGA-II
Most papers that
deployed a single
algorithm did not
provide a good reason
Outline
• Modeling the software product line – “feature models”
• Automated analysis of feature models
• Problem formulation – “multi-objective feature selection”
• Multi-Objective Evolutionary Algorithms (MOEAs)
– Boolean domination vs. continuous domination
– Survey of Pareto-Optimal Search-Based Software Engineering
• Results
– Increasing the number of objectives
– Parameter tuning
– The “PUSH” and “PULL” heuristics
• Scalability of results
– The “population seeding” heuristic
• Future Work
April 23, 2014 25Abdel Salam Sayyad - Doctoral Defense
Increasing the number of objectives
April 23, 2014 26Abdel Salam Sayyad - Doctoral Defense
• A.S. Sayyad, T. Menzies, and H. Ammar, "On the Value of User Preferences in Search-Based Software Engineering:
A Case Study in Software Product Lines," in Proc. ICSE, San Francisco, USA, 2013, pp. 492-501
• %CORRECT = % of solutions that are fully correct
• HV = Hypervolume of dominated region
• SPREAD = measure of diversity
Increasing the number of objectives
April 23, 2014 27Abdel Salam Sayyad - Doctoral Defense
• Algorithms based on Boolean dominance perform badly.
• Default parameter settings were used (e.g. crossover = 90%)
• Took 3 hours to get 52% correct solutions for E-Shop.
• A.S. Sayyad, T. Menzies, and H. Ammar, "On the Value of User Preferences in Search-Based Software Engineering:
A Case Study in Software Product Lines," in Proc. ICSE, San Francisco, USA, 2013, pp. 492-501
Parameter Tuning
• A replication of “A. Arcuri and G. Fraser, "Parameter Tuning or Default Values? An Empirical
Investigation in Search-Based Software Engineering," Emp. Soft. Eng. Feb 2013.
April 23, 2014 28Abdel Salam Sayyad - Doctoral Defense
• A.S. Sayyad, K. Goseva-Popstojanova, T. Menzies, and H. Ammar, "On Parameter Tuning in Search-Based
Software Engineering: A Replicated Empirical Study," in Proc. RESER, Baltimore, USA, 2013
Parameter Tuning
• 20 feature models from SPLOT:
April 23, 2014 29Abdel Salam Sayyad - Doctoral Defense
• A.S. Sayyad, K. Goseva-Popstojanova, T. Menzies, and H. Ammar, "On Parameter Tuning in Search-Based
Software Engineering: A Replicated Empirical Study," in Proc. RESER, Baltimore, USA, 2013
Parameter Tuning Results
• Different parameter settings cause very large variance in the
performance. [Agree with Arcuri & Fraser]
• Default parameter settings perform generally poorly, but might perform
relatively well on individual problem instances. [Stronger than Arcuri &
Fraser]
• Tuning on a sample of problem instances does not, in general, result in
the best parameter values for a new problem instance, but the obtained
setting are generally better than the defaults settings. [Better than Arcuri
& Fraser]
• Best parameter settings across 20 SPLOT feature models:
– Algorithm: IBEA
– Population = 50
– Crossover rate = 0
– Mutation rate = 0.5/FEATURES
April 23, 2014 30Abdel Salam Sayyad - Doctoral Defense
• A.S. Sayyad, K. Goseva-Popstojanova, T. Menzies, and H. Ammar, "On Parameter Tuning in Search-Based
Software Engineering: A Replicated Empirical Study," in Proc. RESER, Baltimore, USA, 2013
Using strong heuristics to Improve performance
• The evolutionary learning of model rules proved to work,
but resulted in long run times for larger models.
• When feature model dependencies and constraints are
not respected, much time is wasted in 5-objective
evaluation of invalid configurations.
• Search smarter. Exploit the model.
April 23, 2014 31Abdel Salam Sayyad - Doctoral Defense
The PUSH heuristic (SPLOT models)
• In SPLOT models, tree structure is explicit.
• Introduce “Tree Mutation”:
April 23, 2014 32Abdel Salam Sayyad - Doctoral Defense
Do not mutate if:
• selecting feature
whose parent is not
selected,
• deselecting a
mandatory child
feature whose parent
is selected
• group cardinality is
violated
The PUSH heuristic (LVAT models)
• In LVAT models, tree
structure is NOT explicit.
• Introduce “Feature Fixing”:
April 23, 2014 33Abdel Salam Sayyad - Doctoral Defense
• Look for mandatory or
dead features.
• Fix those feature in
the evolution.
• Skip rules with only
those features.
• A.S. Sayyad, J. Ingram, T. Menzies, and H. Ammar, "Scalable Product Line Configuration: A Straw to Break the
Camel's Back," in Proc. ASE, Palo Alto, USA, 2013, pp. 465-474
The PULL heuristic
• Give more weight to minimizing violations than other
objectives.
April 23, 2014 34Abdel Salam Sayyad - Doctoral Defense
Maximize features
Maximize code re-use
Minimize defects
Minimize cost
Minimize violations
Minimize violations
Minimize violations
Minimize violations
Experiment with PUSH and PULL
• 20 feature models from SPLOT. (see slide 29)
• 7 feature models from LVAT.
• Comparing IBEA and NSGA-II.
• Each algorithm run 10 times. 5 minutes/run.
April 23, 2014 35Abdel Salam Sayyad - Doctoral Defense
Results (PUSH without PULL)
April 23, 2014 36Abdel Salam Sayyad - Doctoral Defense
%CR = % of solutions that
are fully correct
HV = Hypervolume of
dominated region
SPRD = Spread (measure of
diversity)
Results (PUSH and PULL)
April 23, 2014 37Abdel Salam Sayyad - Doctoral Defense
Time to achieve 100% correct population
April 23, 2014 38Abdel Salam Sayyad - Doctoral Defense
Tim
e (
se
c)
Outline
• Modeling the software product line – “feature models”
• Automated analysis of feature models
• Problem formulation – “multi-objective feature selection”
• Multi-Objective Evolutionary Algorithms (MOEAs)
– Boolean domination vs. continuous domination
– Survey of Pareto-Optimal Search-Based Software Engineering
• Results
– Increasing the number of objectives
– Parameter tuning
– The “PUSH” and “PULL” heuristics
• Scalability of results
– The “population seeding” heuristic
• Future Work
April 23, 2014 39Abdel Salam Sayyad - Doctoral Defense
The “population seeding” heuristic
• Generate a good seed using 2-objective optimization
(minimize violations, maximize features).
• Plant the seed in the initial population for 5-objective
optimization.
April 23, 2014 40Abdel Salam Sayyad - Doctoral Defense
Seeder Grower
Result for “population seeding”
April 23, 2014 41Abdel Salam Sayyad - Doctoral Defense
• A.S. Sayyad, J. Ingram, T. Menzies, and H. Ammar, "Scalable Product Line Configuration: A Straw to Break the
Camel's Back," in Proc. ASE, Palo Alto, USA, 2013, pp. 465-474
Result for “population seeding”
April 23, 2014 42Abdel Salam Sayyad - Doctoral Defense
• A.S. Sayyad, J. Ingram, T. Menzies, and H. Ammar, "Scalable Product Line Configuration: A Straw to Break the
Camel's Back," in Proc. ASE, Palo Alto, USA, 2013, pp. 465-474
Result for “population seeding”
April 23, 2014 43Abdel Salam Sayyad - Doctoral Defense
• A.S. Sayyad, J. Ingram, T. Menzies, and H. Ammar, "Scalable Product Line Configuration: A Straw to Break the
Camel's Back," in Proc. ASE, Palo Alto, USA, 2013, pp. 465-474
Diversity of solutions achieved with seeding
April 23, 2014 44Abdel Salam Sayyad - Doctoral Defense
• Valid solutions are
strongly influenced
by the seed.
• Diminishing returns
after 5 hours.
• Useful for reference-
point optimization.
Using seeds to support reference-point
optimization
April 23, 2014 45Abdel Salam Sayyad - Doctoral Defense
Seeds
Outline
• Modeling the software product line – “feature models”
• Automated analysis of feature models
• Problem formulation – “multi-objective feature selection”
• Multi-Objective Evolutionary Algorithms (MOEAs)
– Boolean domination vs. continuous domination
– Survey of Pareto-Optimal Search-Based Software Engineering
• Results
– Increasing the number of objectives
– Parameter tuning
– The “PUSH” and “PULL” heuristics
• Scalability of results
– The “population seeding” heuristic
• Future Work
April 23, 2014 46Abdel Salam Sayyad - Doctoral Defense
Future Work
• Bigger models. Higher objectives.
• User-in-the-loop configuration.
• Distributed, co-evolving populations.
• Adaptive parameter control.
• Combining MOEAs with Z3 SMT solver.
April 23, 2014 47Abdel Salam Sayyad - Doctoral Defense
Sound Bites
• In modern day software requirements / planning / design /
deployment:
– Software Correctness is only one part of the user preferences.
– Single-objective optimization narrows the view.
– So does the aggregation of multiple objectives.
• Widely-used Boolean Domination truncates the data.
– Continuous Domination exploits the richness in user preferences.
• Let your model guide your search:
– Use strong heuristics derived from the domain model.
– Tune the search parameters to “tread lightly.”
• Scalability to large, real-life models is the ultimate test.
– Innovation in method is key to scalability.
April 23, 2014 4
8
Abdel Salam Sayyad - Doctoral Defense
A Special Thank you
To my Ph.D. examining committee:
• Prof. Hany Ammar (chair)
• Prof. Tim Menzies (co-chair)
• Prof. Ramana Reddy
• Prof. Ali Mili
• Prof. Mario Perhinschi
April 23, 2014 49Abdel Salam Sayyad - Doctoral Defense