Learning to BranchEllen Vitercik
Joint work with Nina Balcan, Travis Dick, and Tuomas Sandholm
Published in ICML 2018
1
Integer Programs (IPs)
a
maximize ๐ โ ๐subject to ๐ด๐ โค ๐
๐ โ {0,1}๐
2
Facility location problems can be formulated as IPs.
3
Clustering problems can be formulated as IPs.
4
Binary classification problems can be formulated as IPs.
5
Integer Programs (IPs)
a
maximize ๐ โ ๐subject to ๐ด๐ = ๐
๐ โ {0,1}๐
NP-hard
6
Branch and Bound (B&B)
โข Most widely-used algorithm for IP-solving (CPLEX, Gurobi)
โข Recursively partitions search space to find an optimal solutionโข Organizes partition as a tree
โข Many parametersโข CPLEX has a 221-page manual describing 135 parameters
โYou may need to experiment.โ
7
Why is tuning B&B parameters important?
โข Save timeโข Solve more problems
โข Find better solutions
8
B&B in the real world
Delivery company routes trucks dailyUse integer programming to select routes
Demand changes every daySolve hundreds of similar optimizations
Using this set of typical problemsโฆ
can we learn best parameters?
9
Application-
Specific
DistributionAlgorithm
Designer
B&B parameters
Model
๐ด 1 , ๐ 1 , ๐ 1 , โฆ , ๐ด ๐ , ๐ ๐ , ๐ ๐
How to use samples to find best B&B parameters for my domain?
10
Model
Model has been studied in applied communities [Hutter et al. โ09]
Application-
Specific
DistributionAlgorithm
Designer
B&B parameters
๐ด 1 , ๐ 1 , ๐ 1 , โฆ , ๐ด ๐ , ๐ ๐ , ๐ ๐
11
Model
Model has been studied from a theoretical perspective
[Gupta and Roughgarden โ16, Balcan et al., โ17]
Application-
Specific
DistributionAlgorithm
Designer
B&B parameters
๐ด 1 , ๐ 1 , ๐ 1 , โฆ , ๐ด ๐ , ๐ ๐ , ๐ ๐
12
Model
1. Fix a set of B&B parameters to optimize
2. Receive sample problems from unknown distribution
3. Find parameters with the best performance on the samples
โBestโ could mean smallest search tree, for example
๐ด 1 , ๐ 1 , ๐ 1 ๐ด 2 , ๐ 2 , ๐ 2
13
Questions to address
How to find parameters that are best on average over samples?
Will those parameters have high performance in expectation?
๐ด, ๐, ๐
?
๐ด 1 , ๐ 1 , ๐ 1 ๐ด 2 , ๐ 2 , ๐ 2
14
Outline
1. Introduction
2. Branch-and-Bound
3. Learning algorithms
4. Experiments
5. Conclusion and Future Directions
15
max (40, 60, 10, 10, 3, 20, 60) โ ๐
s.t. 40, 50, 30, 10, 10, 40, 30 โ ๐ โค 100
๐ โ {0,1}7
16
max (40, 60, 10, 10, 3, 20, 60) โ ๐
s.t. 40, 50, 30, 10, 10, 40, 30 โ ๐ โค 100
๐ โ {0,1}7
1
2, 1, 0, 0, 0, 0, 1
140
17
B&B
1. Choose leaf of tree
max (40, 60, 10, 10, 3, 20, 60) โ ๐
s.t. 40, 50, 30, 10, 10, 40, 30 โ ๐ โค 100
๐ โ {0,1}7
1
2, 1, 0, 0, 0, 0, 1
140
18
B&B
1. Choose leaf of tree
2. Branch on a variable
max (40, 60, 10, 10, 3, 20, 60) โ ๐
s.t. 40, 50, 30, 10, 10, 40, 30 โ ๐ โค 100
๐ โ {0,1}7
1
2, 1, 0, 0, 0, 0, 1
140
1,3
5, 0, 0, 0, 0, 1
136
0, 1, 0, 1, 0,1
4, 1
135
๐ฅ1 = 0 ๐ฅ1 = 1
19
B&B
1. Choose leaf of tree
2. Branch on a variable
max (40, 60, 10, 10, 3, 20, 60) โ ๐
s.t. 40, 50, 30, 10, 10, 40, 30 โ ๐ โค 100
๐ โ {0,1}7
1
2, 1, 0, 0, 0, 0, 1
140
1,3
5, 0, 0, 0, 0, 1
136
0, 1, 0, 1, 0,1
4, 1
135
๐ฅ1 = 0 ๐ฅ1 = 1
20
B&B
1. Choose leaf of tree
2. Branch on a variable
max (40, 60, 10, 10, 3, 20, 60) โ ๐
s.t. 40, 50, 30, 10, 10, 40, 30 โ ๐ โค 100
๐ โ {0,1}7
1
2, 1, 0, 0, 0, 0, 1
140
1,3
5, 0, 0, 0, 0, 1
136
0, 1, 0, 1, 0,1
4, 1
135
๐ฅ1 = 0 ๐ฅ1 = 1
1, 0, 0, 1, 0,1
2, 1
120
1, 1, 0, 0, 0, 0,1
3
120
๐ฅ2 = 0 ๐ฅ2 = 1
21
B&B
1. Choose leaf of tree
2. Branch on a variable
max (40, 60, 10, 10, 3, 20, 60) โ ๐
s.t. 40, 50, 30, 10, 10, 40, 30 โ ๐ โค 100
๐ โ {0,1}7
1
2, 1, 0, 0, 0, 0, 1
140
1,3
5, 0, 0, 0, 0, 1
136
0, 1, 0, 1, 0,1
4, 1
135
1, 0, 0, 1, 0,1
2, 1
120
1, 1, 0, 0, 0, 0,1
3
120
๐ฅ1 = 0 ๐ฅ1 = 1
๐ฅ2 = 0 ๐ฅ2 = 1
22
B&B
1. Choose leaf of tree
2. Branch on a variable
max (40, 60, 10, 10, 3, 20, 60) โ ๐
s.t. 40, 50, 30, 10, 10, 40, 30 โ ๐ โค 100
๐ โ {0,1}7
1
2, 1, 0, 0, 0, 0, 1
140
1,3
5, 0, 0, 0, 0, 1
136
0, 1, 0, 1, 0,1
4, 1
135
1, 0, 0, 1, 0,1
2, 1
120
1, 1, 0, 0, 0, 0,1
3
120
0,3
5, 0, 0, 0, 1, 1
116
0, 1,1
3, 1, 0, 0, 1
133.3
๐ฅ1 = 0 ๐ฅ1 = 1
๐ฅ6 = 0 ๐ฅ6 = 1 ๐ฅ2 = 0 ๐ฅ2 = 1
23
B&B
1. Choose leaf of tree
2. Branch on a variable
max (40, 60, 10, 10, 3, 20, 60) โ ๐
s.t. 40, 50, 30, 10, 10, 40, 30 โ ๐ โค 100
๐ โ {0,1}7
1
2, 1, 0, 0, 0, 0, 1
140
1,3
5, 0, 0, 0, 0, 1
136
0, 1, 0, 1, 0,1
4, 1
135
1, 0, 0, 1, 0,1
2, 1
120
1, 1, 0, 0, 0, 0,1
3
120
0, 1,1
3, 1, 0, 0, 1
133.3
๐ฅ1 = 0 ๐ฅ1 = 1
๐ฅ6 = 0 ๐ฅ6 = 1 ๐ฅ2 = 0 ๐ฅ2 = 1
0,3
5, 0, 0, 0, 1, 1
116
24
B&B
1. Choose leaf of tree
2. Branch on a variable
max (40, 60, 10, 10, 3, 20, 60) โ ๐
s.t. 40, 50, 30, 10, 10, 40, 30 โ ๐ โค 100
๐ โ {0,1}7
1
2, 1, 0, 0, 0, 0, 1
140
1,3
5, 0, 0, 0, 0, 1
136
0, 1, 0, 1, 0,1
4, 1
135
1, 0, 0, 1, 0,1
2, 1
120
1, 1, 0, 0, 0, 0,1
3
120
0,3
5, 0, 0, 0, 1, 1
116
0, 1,1
3, 1, 0, 0, 1
133.3
0, 1, 0, 1, 1, 0, 1 0,4
5, 1, 0, 0, 0, 1
118
๐ฅ1 = 0 ๐ฅ1 = 1
๐ฅ6 = 0 ๐ฅ6 = 1 ๐ฅ2 = 0 ๐ฅ2 = 1
๐ฅ3 = 0 ๐ฅ3 = 1
133
25
B&B
1. Choose leaf of tree
2. Branch on a variable
3. Fathom leaf if:i. LP relaxation solution is
integral
ii. LP relaxation is infeasible
iii. LP relaxation solution isnโt better than best-known integral solution
max (40, 60, 10, 10, 3, 20, 60) โ ๐
s.t. 40, 50, 30, 10, 10, 40, 30 โ ๐ โค 100
๐ โ {0,1}7
1
2, 1, 0, 0, 0, 0, 1
140
1,3
5, 0, 0, 0, 0, 1
136
0, 1, 0, 1, 0,1
4, 1
135
1, 0, 0, 1, 0,1
2, 1
120
1, 1, 0, 0, 0, 0,1
3
120
0,3
5, 0, 0, 0, 1, 1
116
0, 1,1
3, 1, 0, 0, 1
133.3
๐ฅ1 = 0 ๐ฅ1 = 1
๐ฅ6 = 0 ๐ฅ6 = 1 ๐ฅ2 = 0 ๐ฅ2 = 1
๐ฅ3 = 0 ๐ฅ3 = 1
0,4
5, 1, 0, 0, 0, 1
118
0, 1, 0, 1, 1, 0, 1
133
26
B&B
1. Choose leaf of tree
2. Branch on a variable
3. Fathom leaf if:i. LP relaxation solution
is integral
ii. LP relaxation is infeasible
iii. LP relaxation solution isnโt better than best-known integral solution
max (40, 60, 10, 10, 3, 20, 60) โ ๐
s.t. 40, 50, 30, 10, 10, 40, 30 โ ๐ โค 100
๐ โ {0,1}7
1
2, 1, 0, 0, 0, 0, 1
140
1,3
5, 0, 0, 0, 0, 1
136
0, 1, 0, 1, 0,1
4, 1
135
1, 0, 0, 1, 0,1
2, 1
120
1, 1, 0, 0, 0, 0,1
3
120
0,3
5, 0, 0, 0, 1, 1
116
0, 1,1
3, 1, 0, 0, 1
133.3
๐ฅ1 = 0 ๐ฅ1 = 1
๐ฅ6 = 0 ๐ฅ6 = 1 ๐ฅ2 = 0 ๐ฅ2 = 1
๐ฅ3 = 0 ๐ฅ3 = 1
0,4
5, 1, 0, 0, 0, 1
118
0, 1, 0, 1, 1, 0, 1
133Integral
27
B&B
1. Choose leaf of tree
2. Branch on a variable
3. Fathom leaf if:i. LP relaxation solution is
integral
ii. LP relaxation is infeasible
iii. LP relaxation solution isnโt better than best-known integral solution
max (40, 60, 10, 10, 3, 20, 60) โ ๐
s.t. 40, 50, 30, 10, 10, 40, 30 โ ๐ โค 100
๐ โ {0,1}7
1
2, 1, 0, 0, 0, 0, 1
140
1,3
5, 0, 0, 0, 0, 1
136
0, 1, 0, 1, 0,1
4, 1
135
1, 0, 0, 1, 0,1
2, 1
120
1, 1, 0, 0, 0, 0,1
3
120
0,3
5, 0, 0, 0, 1, 1
116
0, 1,1
3, 1, 0, 0, 1
133.3
๐ฅ1 = 0 ๐ฅ1 = 1
๐ฅ6 = 0 ๐ฅ6 = 1 ๐ฅ2 = 0 ๐ฅ2 = 1
๐ฅ3 = 0 ๐ฅ3 = 1
0,4
5, 1, 0, 0, 0, 1
118
0, 1, 0, 1, 1, 0, 1
133
28
B&B
1. Choose leaf of tree
2. Branch on a variable
3. Fathom leaf if:i. LP relaxation solution is
integral
ii. LP relaxation is infeasible
iii. LP relaxation solution isnโt better than best-known integral solution
This talk: How to choose which variable?(Assume every other aspect of B&B is fixed.)
29
Variable selection policies can have a huge effect on tree size
30
Outline
1. Introduction
2. Branch-and-Bounda. Algorithm Overview
b. Variable Selection Policies
3. Learning algorithms
4. Experiments
5. Conclusion and Future Directions
31
Variable selection policies (VSPs)
Score-based VSP:
At leaf ๐ธ, branch on variable ๐๐ maximizing๐ฌ๐๐จ๐ซ๐ ๐ธ, ๐
Many options! Little known about which to use when
1,3
5, 0, 0, 0, 0, 1
136
1, 0, 0, 1, 0,1
2, 1
120
1, 1, 0, 0, 0, 0,1
3
120
๐ฅ2 = 0 ๐ฅ2 = 1
32
Variable selection policies
For an IP instance ๐:
โข Let ๐๐ be the objective value of its LP relaxation
โข Let ๐๐โ be ๐ with ๐ฅ๐ set to 0, and let ๐๐
+ be ๐ with ๐ฅ๐ set to 1
Example.
1
2, 1, 0, 0, 0, 0, 1
140
max (40, 60, 10, 10, 3, 20, 60) โ ๐s.t. 40, 50, 30, 10, 10, 40, 30 โ ๐ โค 100
๐ โ {0,1}7 ๐๐๐
33
Variable selection policies
For an IP instance ๐:
โข Let ๐๐ be the objective value of its LP relaxation
โข Let ๐๐โ be ๐ with ๐ฅ๐ set to 0, and let ๐๐
+ be ๐ with ๐ฅ๐ set to 1
Example.
1
2, 1, 0, 0, 0, 0, 1
140
1,3
5, 0, 0, 0, 0, 1
136
0, 1, 0, 1, 0,1
4, 1
135
๐ฅ1 = 0 ๐ฅ1 = 1
max (40, 60, 10, 10, 3, 20, 60) โ ๐s.t. 40, 50, 30, 10, 10, 40, 30 โ ๐ โค 100
๐ โ {0,1}7
๐๐1โ ๐๐1+
๐๐๐
34
For a IP instance ๐:
โข Let ๐๐ be the objective value of its LP relaxation
โข Let ๐๐โ be ๐ with ๐ฅ๐ set to 0, and let ๐๐
+ be ๐ with ๐ฅ๐ set to 1
Example.
Variable selection policies
1
2, 1, 0, 0, 0, 0, 1
140
1,3
5, 0, 0, 0, 0, 1
136
0, 1, 0, 1, 0,1
4, 1
135
๐ฅ1 = 0 ๐ฅ1 = 1
๐๐1โ ๐๐1+
๐๐๐
max (40, 60, 10, 10, 3, 20, 60) โ ๐s.t. 40, 50, 30, 10, 10, 40, 30 โ ๐ โค 100
๐ โ {0,1}7
The linear rule (parameterized by ๐) [Linderoth & Savelsbergh, 1999]
Branch on variable ๐ฅ๐ maximizing:
score ๐, ๐ = ๐min ๐๐ โ ๐๐๐โ , ๐๐ โ ๐๐๐
+ + (1 โ ๐)max ๐๐ โ ๐๐๐โ , ๐๐ โ ๐๐๐
+
35
Variable selection policies
And many moreโฆ
The (simplified) product rule [Achterberg, 2009]
Branch on variable ๐ฅ๐ maximizing:
score ๐, ๐ = ๐๐ โ ๐๐๐โ โ ๐๐ โ ๐๐๐
+
The linear rule (parameterized by ๐) [Linderoth & Savelsbergh, 1999]
Branch on variable ๐ฅ๐ maximizing:
score ๐, ๐ = ๐min ๐๐ โ ๐๐๐โ , ๐๐ โ ๐๐๐
+ + (1 โ ๐)max ๐๐ โ ๐๐๐โ , ๐๐ โ ๐๐๐
+
36
Variable selection policies
Given ๐ scoring rules score1, โฆ , scored.
Goal: Learn best convex combination ๐1score1 +โฏ+ ๐๐scored.
Branch on variable ๐ฅ๐ maximizing:
score ๐, ๐ = ๐1score1 ๐, ๐ + โฏ+ ๐๐scored ๐, ๐
Our parameterized rule
37
Application-
Specific
DistributionAlgorithm
Designer
B&B parameters
Model
๐ด 1 , ๐ 1 , ๐ 1 , โฆ , ๐ด ๐ , ๐ ๐ , ๐ ๐
How to use samples to find best B&B parameters for my domain?
38
Application-
Specific
DistributionAlgorithm
Designer
B&B parameters
Model
๐ด 1 , ๐ 1 , ๐ 1 , โฆ , ๐ด ๐ , ๐ ๐ , ๐ ๐
๐1, โฆ , ๐๐
How to use samples to find best B&B parameters for my domain?๐1, โฆ , ๐๐
39
Outline
1. Introduction
2. Branch-and-Bound
3. Learning algorithmsa. First-try: Discretization
b. Our Approach
4. Experiments
5. Conclusion and Future Directions
40
First try: Discretization
1. Discretize parameter space
2. Receive sample problems from unknown distribution
3. Find params in discretization with best average performance
๐
Average tree size
41
First try: Discretization
This has been prior workโs approach [e.g., Achterberg (2009)].
๐
Average tree size
42
Discretization gone wrong
๐
Average tree size
43
Discretization gone wrong
๐
Average tree size
This can
actually
happen!
44
Discretization gone wrong
Theorem [informal]. For any discretization:
Exists problem instance distribution ๐ inducing this behavior
Proof ideas:
๐โs support consists of infeasible IPs with โeasy outโ variablesB&B takes exponential time unless branches on โeasy outโ variables
B&B only finds โeasy outsโ if uses parameters from specific range
Expected tree size
๐
45
Outline
1. Introduction
2. Branch-and-Bound
3. Learning algorithmsa. First-try: Discretization
b. Our Approachi. Single-parameter settings
ii. Multi-parameter settings
4. Experiments
5. Conclusion and Future Directions
46
Simple assumption
Exists ๐ upper bounding the size of largest tree willing to build
Common assumption, e.g.:
โข Hutter, Hoos, Leyton-Brown, Stรผtzle, JAIRโ09
โข Kleinberg, Leyton-Brown, Lucier, IJCAIโ17
47
๐ โ [0,1]
Lemma: For any two scoring rules and any IP ๐,
๐ (# variables)๐ +2 intervals partition [0,1] such that:
For any interval [๐, ๐], B&B builds same tree across all ๐ โ ๐, ๐
Much smaller in our experiments!
Useful lemma
48
Useful lemma
branch
on ๐ฅ2
branch
on ๐ฅ3
๐
๐ โ score1 ๐, 1 + (1 โ ๐) โ score2 ๐, 1
๐ โ score1 ๐, 2 + (1 โ ๐) โ score2 ๐, 2
๐ โ score1 ๐, 3 + (1 โ ๐) โ score2 ๐, 3
branch
on ๐ฅ1
Lemma: For any two scoring rules and any IP ๐,
๐ (# variables)๐ +2 intervals partition [0,1] such that:
For any interval [๐, ๐], B&B builds same tree across all ๐ โ ๐, ๐
49
Useful lemma
๐
๐2โ ๐2
+
Any ๐ in yellow interval:
๐ฅ2 = 0 ๐ฅ2 = 1
branch
on ๐ฅ2
branch
on ๐ฅ3
๐
branch
on ๐ฅ1
Lemma: For any two scoring rules and any IP ๐,
๐ (# variables)๐ +2 intervals partition [0,1] such that:
For any interval [๐, ๐], B&B builds same tree across all ๐ โ ๐, ๐
๐2โ
50
Useful lemma
Lemma: For any two scoring rules and any IP ๐,
๐ (# variables)๐ +2 intervals partition [0,1] such that:
For any interval [๐, ๐], B&B builds same tree across all ๐ โ ๐, ๐
๐
๐ โ score1 ๐2โ, 1 + (1 โ ๐) โ score2 ๐2
โ, 1
๐ โ score1 ๐2โ, 3 + (1 โ ๐) โ score2 ๐2
โ, 3
๐
๐2โ ๐2
+
Any ๐ in yellow interval:
๐ฅ2 = 0 ๐ฅ2 = 1
branch on
๐ฅ2 then ๐ฅ3
branch on
๐ฅ2 then ๐ฅ1
51
Useful lemma
Lemma: For any two scoring rules and any IP ๐,
๐ (# variables)๐ +2 intervals partition [0,1] such that:
For any interval [๐, ๐], B&B builds same tree across all ๐ โ ๐, ๐
๐
Any ๐ in blue-yellow interval:
branch on
๐ฅ2 then ๐ฅ3
branch on
๐ฅ2 then ๐ฅ1
๐
๐2โ ๐2
+
๐ฅ3 = 0 ๐ฅ3 = 1
52
๐ฅ2 = 0 ๐ฅ2 = 1
Useful lemma
Proof idea.
โข Continue dividing [0,1] into intervals s.t.:
In each interval, var. selection order fixed
โข Can subdivide only finite number of times
โข Proof follows by induction on tree depth
Lemma: For any two scoring rules and any IP ๐,
๐ (# variables)๐ +2 intervals partition [0,1] such that:
For any interval [๐, ๐], B&B builds same tree across all ๐ โ ๐, ๐
๐
branch on
๐ฅ2 then ๐ฅ3
branch on
๐ฅ2 then ๐ฅ1
53
Learning algorithm
Input: Set of IPs sampled from a distribution ๐
For each IP, set ๐ = 0. While ๐ < 1:1. Run B&B using ๐ โ score1 + (1 โ ๐) โ score2, resulting in tree ๐ฏ
2. Find interval ๐, ๐โฒ where if B&B is run using the scoring rule ๐โฒโฒ โ score1 + 1 โ ๐โฒโฒ โ score2
for any ๐โฒโฒ โ ๐, ๐โฒ , B&B will build tree ๐ฏ (takes a little bookkeeping)
3. Set ๐ = ๐โฒ
Return: Any เท๐ from the interval minimizing average tree size
๐ โ [0,1]
54
Learning algorithm guarantees
Let ฦธ๐ be algorithmโs output given เทจ๐๐ 3
๐2ln(#variables) samples.
W.h.p., ๐ผ๐~๐[tree-size(๐, ฦธ๐)] โ min๐โ 0,1
๐ผ๐~๐[treeโsize(๐, ๐)] < ๐
Proof intuition: Bound algorithm classโs intrinsic complexity (IC)โข Lemma bounds the number of โtruly differentโ parameters
โข Parameters that are โthe sameโ come from a simple set
Learning theory allows us to translate IC to sample complexity
๐ โ [0,1]
55
Outline
1. Introduction
2. Branch-and-Bound
3. Learning algorithmsa. First-try: Discretization
b. Our Approachi. Single-parameter settings
ii. Multi-parameter settings
4. Experiments
5. Conclusion and Future Directions
56
Useful lemma: higher dimensions
Lemma: For any ๐ scoring rules and any IP,
a set โ of ๐ (# variables)๐ +2 hyperplanes partitions 0,1 ๐ s.t.:
For any connected component ๐ of 0,1 ๐ โโ,
B&B builds the same tree across all ๐ โ ๐
57
Learning-theoretic guarantees
Fix ๐ scoring rules and draw samples ๐1, โฆ , ๐๐~๐
If ๐ = เทจ๐๐ 3
๐2ln(๐ โ #variables) , then w.h.p., for all ๐ โ [0,1]๐,
1
๐
๐=1
๐
treeโsize(๐๐ , ๐) โ ๐ผ๐~๐[treeโsize(๐, ๐)] < ๐
Average tree size generalizes to expected tree size
58
Outline
1. Introduction
2. Branch-and-Bound
3. Learning algorithms
4. Experiments
5. Conclusion and Future Directions
59
Experiments: Tuning the linear rule
Let: score1 ๐, ๐ = min ๐๐ โ ๐๐๐โ , ๐๐ โ ๐๐๐
+
score2 ๐, ๐ = max ๐๐ โ ๐๐๐โ , ๐๐ โ ๐๐๐
+
Our parameterized rule
Branch on variable ๐ฅ๐ maximizing:
score ๐, ๐ = ๐ โ score1 ๐, ๐ + (1 โ ๐) โ score2 ๐, ๐
This is the linear rule [Linderoth & Savelsbergh, 1999]
60
Experiments: Combinatorial auctions
Leyton-Brown, Pearson, and Shoham. Towards a universal test suite for combinatorial auction algorithms. In Proceedings of the Conference on Electronic Commerce (EC), 2000.
โRegionsโ generator:
400 bids, 200 goods, 100 instances
โArbitraryโ generator:
200 bids, 100 goods, 100 instances
61
Additional experiments
Facility location:
70 facilities, 70 customers,
500 instances
Clustering:
5 clusters, 35 nodes,
500 instances
Agnostically learning
linear separators:
50 points in โ2,
500 instances
62
Outline
1. Introduction
2. Branch-and-Bound
3. Learning algorithms
4. Experiments
5. Conclusion and Future Directions
63
Conclusion
โข Study B&B, a widely-used algorithm for combinatorial problems
โข Show how to use ML to weight variable selection rulesโข First sample complexity bounds for tree search algorithm configuration
โข Unlike prior work [Khalil et al. โ16; Alvarez et al. โ17], which is purely empirical
โข Empirically show our approach can dramatically shrink tree sizeโข We prove this improvement can even be exponential
โข Theory applies to other tree search algos, e.g., for solving CSPs
64
Future directions
How can we train faster?โข Donโt want to build every tree B&B will make for every training instance โข Train on small IPs and then apply the learned policies on large IPs?
Other tree-building applications can we apply our techniques to?โข E.g., building decision trees and taxonomies
How can we attack other learning problems in B&B? โข E.g., node-selection policies
65
Thank you! Questions?