106
Learning Programs: A Hierarchical Bayesian Approach ICML - Haifa, Israel June 24, 2010 Percy Liang Michael I. Jordan Dan Klein

Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just program themselves since I don't like programming

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Learning Programs:

A Hierarchical Bayesian Approach

ICML - Haifa, Israel June 24, 2010

Percy Liang Michael I. Jordan Dan Klein

Page 2: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Motivating Application: Repetitive Text Editing

I like programs, but I wish programswould just program themselves sinceI don’t like programming. =⇒

I like <i>programs</i>, but I wish <i>programs</i>would just <i>program</i> themselves sinceI don’t like <i>programming</i>.

2

Page 3: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Motivating Application: Repetitive Text Editing

I like programs, but I wish programswould just program themselves sinceI don’t like programming. =⇒

I like <i>programs</i>, but I wish <i>programs</i>would just <i>program</i> themselves sinceI don’t like <i>programming</i>.

Goal: Programming by Demonstration

If the user demonstrates italicizing the first occurrence,can we generalize to the remaining?

2

Page 4: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Motivating Application: Repetitive Text Editing

I like programs, but I wish programswould just program themselves sinceI don’t like programming. =⇒

I like <i>programs</i>, but I wish <i>programs</i>would just <i>program</i> themselves sinceI don’t like <i>programming</i>.

Goal: Programming by Demonstration

If the user demonstrates italicizing the first occurrence,can we generalize to the remaining?

Solution: represent task by a program to be learned

1. Move to next occurrence of word with prefix program2. Insert <i>3. Move to end of word4. Insert </i>

2

Page 5: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Motivating Application: Repetitive Text Editing

I like programs, but I wish programswould just program themselves sinceI don’t like programming. =⇒

I like <i>programs</i>, but I wish <i>programs</i>would just <i>program</i> themselves sinceI don’t like <i>programming</i>.

Goal: Programming by Demonstration

If the user demonstrates italicizing the first occurrence,can we generalize to the remaining?

Solution: represent task by a program to be learned

1. Move to next occurrence of word with prefix program2. Insert <i>3. Move to end of word4. Insert </i>

Challenge: learn from very few examples2

Page 6: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

General Setup

Goal:

(X1, Y1)· · ·

(Xn, Yn)

Training data

3

Page 7: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

General Setup

Goal:

(X1, Y1)· · ·

(Xn, Yn)=⇒ Z such that (Z Xj) = Yj

Training data Consistent program

3

Page 8: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

General Setup

Goal:

(X1, Y1)· · ·

(Xn, Yn)=⇒ Z such that (Z Xj) = Yj

Training data Consistent program

Challenge:

When n small, many programs consistent with training data.

I like <i>programs</i>, but I wish programswould just program themselves sinceI don’t like programming.

Move to beginning of third word, ...Move to beginning of word after like, ...Move 7 spaces to the right, ...Move to word with prefix program, ...· · ·

3

Page 9: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

General Setup

Goal:

(X1, Y1)· · ·

(Xn, Yn)=⇒ Z such that (Z Xj) = Yj

Training data Consistent program

Challenge:

When n small, many programs consistent with training data.

I like <i>programs</i>, but I wish programswould just program themselves sinceI don’t like programming.

Move to beginning of third word, ...Move to beginning of word after like, ...Move 7 spaces to the right, ...Move to word with prefix program, ...· · ·

Which program to choose?3

Page 10: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Key Intuition

One task:

Want to choose a program which is simple (Occam’s razor).

Examples =⇒ Z

4

Page 11: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Key Intuition

One task:

Want to choose a program which is simple (Occam’s razor).

Examples =⇒ Z

What’s the right complexity metric (prior)?

4

Page 12: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Key Intuition

One task:

Want to choose a program which is simple (Occam’s razor).

Examples =⇒ Z

What’s the right complexity metric (prior)? No general answer.

4

Page 13: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Key Intuition

One task:

Want to choose a program which is simple (Occam’s razor).

Examples =⇒ Z

What’s the right complexity metric (prior)? No general answer.

Multiple tasks:

Task 1 examples =⇒ Z1

· · · · · ·Task K examples =⇒ ZK

4

Page 14: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Key Intuition

One task:

Want to choose a program which is simple (Occam’s razor).

Examples =⇒ Z

What’s the right complexity metric (prior)? No general answer.

Multiple tasks:

Task 1 examples =⇒ Z1

· · · · · ·Task K examples =⇒ ZK

Find programs that share common subprograms.

4

Page 15: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Key Intuition

One task:

Want to choose a program which is simple (Occam’s razor).

Examples =⇒ Z

What’s the right complexity metric (prior)? No general answer.

Multiple tasks:

Task 1 examples =⇒ Z1

· · · · · ·Task K examples =⇒ ZK

Find programs that share common subprograms.

• Programs do tend to share common components.

4

Page 16: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Key Intuition

One task:

Want to choose a program which is simple (Occam’s razor).

Examples =⇒ Z

What’s the right complexity metric (prior)? No general answer.

Multiple tasks:

Task 1 examples =⇒ Z1

· · · · · ·Task K examples =⇒ ZK

Find programs that share common subprograms.

• Programs do tend to share common components.

• Penalize joint complexity of all K programs.

4

Page 17: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Outline of Proposed Solution

Program representation: What are subprograms?

Combinatory logic +

∗ I

B I

S

B 1C

5

Page 18: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Outline of Proposed Solution

Program representation: What are subprograms?

Combinatory logic +

∗ I

B I

S

B 1C

Probabilistic model: Which programs are favorable?

Nonparametric Bayesα0

p0

{Gt} Zi Yij Xij

nK

5

Page 19: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Outline of Proposed Solution

Program representation: What are subprograms?

Combinatory logic +

∗ I

B I

S

B 1C

Probabilistic model: Which programs are favorable?

Nonparametric Bayesα0

p0

{Gt} Zi Yij Xij

nK

Statistical inference: How do we search for good programs?

MCMCx y

r1Br z

r0

⇒ x

y z

r′1

r′0r

5

Page 20: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Outline of Proposed Solution

Program representation: What are subprograms?

Combinatory logic +

∗ I

B I

S

B 1C

Probabilistic model: Which programs are favorable?

Nonparametric Bayesα0

p0

{Gt} Zi Yij Xij

nK

Statistical inference: How do we search for good programs?

MCMCx y

r1Br z

r0

⇒ x

y z

r′1

r′0r

6

Page 21: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Representation: What Language?

Goal: allow sharing of subprograms

7

Page 22: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Representation: What Language?

Goal: allow sharing of subprograms

Our language:

Combinatory logic [Schonfinkel, 1924]

7

Page 23: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Representation: What Language?

Goal: allow sharing of subprograms

Our language:

Combinatory logic [Schonfinkel, 1924]

+ higher-order combinators (new)

+ routing intuition, visual representation (new)

7

Page 24: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Representation: What Language?

Goal: allow sharing of subprograms

Our language:

Combinatory logic [Schonfinkel, 1924]

+ higher-order combinators (new)

+ routing intuition, visual representation (new)

Properties: no mutation, no variables ⇒ simple semantics

7

Page 25: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Representation: What Language?

Goal: allow sharing of subprograms

Our language:

Combinatory logic [Schonfinkel, 1924]

+ higher-order combinators (new)

+ routing intuition, visual representation (new)

Properties: no mutation, no variables ⇒ simple semantics

Result:

• Programs are trees

• Subprograms are subtrees

7

Page 26: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with No Arguments

Example: compute min(3, 4)

8

Page 27: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with No Arguments

Example: compute min(3, 4)(if (< 3 4) 3 4)

8

Page 28: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with No Arguments

Example: compute min(3, 4)(if (< 3 4) 3 4)

if

< 34

34

8

Page 29: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with No Arguments

Example: compute min(3, 4)(if (< 3 4) 3 4)

if

< 34

34

General:

x y⇒ result of applying function x to argument y

8

Page 30: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with No Arguments

Example: compute min(3, 4)(if (< 3 4) 3 4)

if

< 34

34

General:

x y⇒ result of applying function x to argument y

Arguments are curried

8

Page 31: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with No Arguments

Example: compute min(3, 4)(if (< 3 4) 3 4) (if true 3 4)

if

< 34

34⇒

if true3

4

General:

x y⇒ result of applying function x to argument y

Arguments are curried

8

Page 32: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with No Arguments

Example: compute min(3, 4)(if (< 3 4) 3 4) (if true 3 4)

if

< 34

34⇒

if true3

4 ⇒ 3

General:

x y⇒ result of applying function x to argument y

Arguments are curried

8

Page 33: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with One Argument

Example: x 7→ x2 + 1

9

Page 34: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with One Argument

Example: x 7→ x2 + 1

λx . +

∗ xx

1

Lambda calculus

9

Page 35: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with One Argument

Example: x 7→ x2 + 1

λx . +

∗ xx

1

+

∗ I

B I

S

B 1C

Lambda calculus Combinatory logic

9

Page 36: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with One Argument

Example: x 7→ x2 + 1

λx . +

∗ xx

1

+

∗ I

B I

S

B 1C

Lambda calculus Combinatory logic

Intuition:

Combinators {B,C,S, I} encode placement of arguments

9

Page 37: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with One Argument

Example: x 7→ x2 + 1

λx . +

∗ xx

1

+

∗ I

B I

S

B 1C

Lambda calculus Combinatory logic

Intuition:

Combinators {B,C,S, I} encode placement of arguments

Semantics:

x y

r⇔ (r x y)

r ∈ {B,C,S, I} 9

Page 38: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with One Argument

Example: x 7→ x2 + 1

λx . +

∗ xx

1

+

∗ I

B I

S

B 1C

Lambda calculus Combinatory logic

Intuition:

Combinators {B,C,S, I} encode placement of arguments

Semantics:

x y

r⇔ (r x y)

r ∈ {B,C,S, I}

Rules:(B f g x) = (f (g x))... 9

Page 39: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with One Argument

Example: Apply x 7→ x2 + 1 to 5

10

Page 40: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with One Argument

Example: Apply x 7→ x2 + 1 to 5

+

∗ I

B I

S

B 1C 5

10

Page 41: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with One Argument

Example: Apply x 7→ x2 + 1 to 5

+

∗ I

B I

S

B 1C 5

x yC z ⇔

x zy

route left

10

Page 42: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with One Argument

Example: Apply x 7→ x2 + 1 to 5

+

∗ I

B I

S

B 51 x y

C z ⇔x z

y

route left

10

Page 43: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with One Argument

Example: Apply x 7→ x2 + 1 to 5

+

∗ I

B I

S

B 51 x y

C z ⇔x z

y

route left

x yB z ⇔ x

y z

route right

10

Page 44: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with One Argument

Example: Apply x 7→ x2 + 1 to 5

+

∗ I

B I

S 5

1 x yC z ⇔

x zy

route left

x yB z ⇔ x

y z

route right

10

Page 45: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with One Argument

Example: Apply x 7→ x2 + 1 to 5

+

∗ I

B I

S 5

1 x yC z ⇔

x zy

route left

x yB z ⇔ x

y z

route right

x yS z ⇔

x z y z

route left and right10

Page 46: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with One Argument

Example: Apply x 7→ x2 + 1 to 5

+

∗ I

B 5 I 5

1 x yC z ⇔

x zy

route left

x yB z ⇔ x

y z

route right

x yS z ⇔

x z y z

route left and right10

Page 47: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with One Argument

Example: Apply x 7→ x2 + 1 to 5

+

∗ I

B 5 I 5

1 x yC z ⇔

x zy

route left

x yB z ⇔ x

y z

route right

x yS z ⇔

x z y z

route left and right

I x ⇔ x

stop10

Page 48: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with One Argument

Example: Apply x 7→ x2 + 1 to 5

+

∗ I

B 55

1 x yC z ⇔

x zy

route left

x yB z ⇔ x

y z

route right

x yS z ⇔

x z y z

route left and right

I x ⇔ x

stop10

Page 49: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with One Argument

Example: Apply x 7→ x2 + 1 to 5

+

∗ 55

1

x yC z ⇔

x zy

route left

x yB z ⇔ x

y z

route right

x yS z ⇔

x z y z

route left and right

I x ⇔ x

stop10

Page 50: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with Multiple Arguments

Example: (x, y) 7→ min(x, y)

11

Page 51: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with Multiple Arguments

Example: (x, y) 7→ min(x, y)Classical: first-order combinators {B,C,S, I}

Complete basis, so can implement min, but cumbersome

11

Page 52: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with Multiple Arguments

Example: (x, y) 7→ min(x, y)Classical: first-order combinators {B,C,S, I}

Complete basis, so can implement min, but cumbersome

New: higher-order combinators {B,C,S, I}∗

Infinite basis, but resulting programs are more intuitive

e.g., CS routes 1st arg. left, 2nd arg. left and right

11

Page 53: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with Multiple Arguments

Example: (x, y) 7→ min(x, y)Classical: first-order combinators {B,C,S, I}

Complete basis, so can implement min, but cumbersome

New: higher-order combinators {B,C,S, I}∗

Infinite basis, but resulting programs are more intuitive

e.g., CS routes 1st arg. left, 2nd arg. left and right

if <

BB I

SC I

CS

11

Page 54: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with Multiple Arguments

Example: (x, y) 7→ min(x, y)Classical: first-order combinators {B,C,S, I}

Complete basis, so can implement min, but cumbersome

New: higher-order combinators {B,C,S, I}∗

Infinite basis, but resulting programs are more intuitive

e.g., CS routes 1st arg. left, 2nd arg. left and right

if <

BB I

SC I

CS 34

11

Page 55: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with Multiple Arguments

Example: (x, y) 7→ min(x, y)Classical: first-order combinators {B,C,S, I}

Complete basis, so can implement min, but cumbersome

New: higher-order combinators {B,C,S, I}∗

Infinite basis, but resulting programs are more intuitive

e.g., CS routes 1st arg. left, 2nd arg. left and right

if <

BB I

SC I

CS 34

11

Page 56: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with Multiple Arguments

Example: (x, y) 7→ min(x, y)Classical: first-order combinators {B,C,S, I}

Complete basis, so can implement min, but cumbersome

New: higher-order combinators {B,C,S, I}∗

Infinite basis, but resulting programs are more intuitive

e.g., CS routes 1st arg. left, 2nd arg. left and right

if< 3

B 3C I

S 4

11

Page 57: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with Multiple Arguments

Example: (x, y) 7→ min(x, y)Classical: first-order combinators {B,C,S, I}

Complete basis, so can implement min, but cumbersome

New: higher-order combinators {B,C,S, I}∗

Infinite basis, but resulting programs are more intuitive

e.g., CS routes 1st arg. left, 2nd arg. left and right

if< 3

B 3C I

S 4

11

Page 58: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Programs with Multiple Arguments

Example: (x, y) 7→ min(x, y)Classical: first-order combinators {B,C,S, I}

Complete basis, so can implement min, but cumbersome

New: higher-order combinators {B,C,S, I}∗

Infinite basis, but resulting programs are more intuitive

e.g., CS routes 1st arg. left, 2nd arg. left and right

if

< 34

34

11

Page 59: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Using Combinators for Refactoring

min max

if <

BB I

SC I

CS

if >

BB I

SC I

CS

12

Page 60: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Using Combinators for Refactoring

min max

if <

BB I

SC I

CS

if >

BB I

SC I

CS

No significant sharing of subtrees (subprograms)

12

Page 61: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Using Combinators for Refactoring

min max

if <

BB I

SC I

CS

if >

BB I

SC I

CS

No significant sharing of subtrees (subprograms)

Refactored:

if I

BBB I

CSC I

CCS <

if I

BBB I

CSC I

CCS >

12

Page 62: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Using Combinators for Refactoring

min max

if <

BB I

SC I

CS

if >

BB I

SC I

CS

No significant sharing of subtrees (subprograms)

Refactored:

if I

BBB I

CSC I

CCS <

if I

BBB I

CSC I

CCS >

Fruitful sharing of subtrees (subprograms)12

Page 63: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Summary

Introduced new combinatory logic basis (intuition: routing)

13

Page 64: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Summary

Introduced new combinatory logic basis (intuition: routing)

Purpose of these combinators:

• Represent multi-argument functions

• Allow refactoring to expose common substructures

13

Page 65: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Summary

Introduced new combinatory logic basis (intuition: routing)

Purpose of these combinators:

• Represent multi-argument functions

• Allow refactoring to expose common substructures

Achieved uniformity: Every subtree is a subprogram

13

Page 66: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Outline of Proposed Solution

Program representation: What are subprograms?

Combinatory logic +

∗ I

B I

S

B 1C

Probabilistic model: Which programs are favorable?

Nonparametric Bayesα0

p0

{Gt} Zi Yij Xij

nK

Statistical inference: How do we search for good programs?

MCMCx y

r1Br z

r0

⇒ x

y z

r′1

r′0r

14

Page 67: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Probabilistic Context-Free Grammars

GenIndep(t): [returns a combinator of type t]

15

Page 68: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Probabilistic Context-Free Grammars

GenIndep(t): [returns a combinator of type t]With probability λ0:

15

Page 69: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Probabilistic Context-Free Grammars

GenIndep(t): [returns a combinator of type t]With probability λ0:

Return a random primitive combinator (e.g., +, 3, I)

15

Page 70: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Probabilistic Context-Free Grammars

GenIndep(t): [returns a combinator of type t]With probability λ0:

Return a random primitive combinator (e.g., +, 3, I)Else:

Choose a type sx← GenIndep(s→ t)

15

Page 71: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Probabilistic Context-Free Grammars

GenIndep(t): [returns a combinator of type t]With probability λ0:

Return a random primitive combinator (e.g., +, 3, I)Else:

Choose a type sx← GenIndep(s→ t)y ← GenIndep(s)

15

Page 72: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Probabilistic Context-Free Grammars

GenIndep(t): [returns a combinator of type t]With probability λ0:

Return a random primitive combinator (e.g., +, 3, I)Else:

Choose a type sx← GenIndep(s→ t)y ← GenIndep(s)return (x, y)

15

Page 73: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Probabilistic Context-Free Grammars

GenIndep(t): [returns a combinator of type t]With probability λ0:

Return a random primitive combinator (e.g., +, 3, I)Else:

Choose a type sx← GenIndep(s→ t)y ← GenIndep(s)return (x, y)

Example:

GenIndep(int→ int) =⇒ + 1

15

Page 74: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Probabilistic Context-Free Grammars

GenIndep(t): [returns a combinator of type t]With probability λ0:

Return a random primitive combinator (e.g., +, 3, I)Else:

Choose a type sx← GenIndep(s→ t)y ← GenIndep(s)return (x, y)

Example:

GenIndep(int→ int) =⇒ ∗

− 31

15

Page 75: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Probabilistic Context-Free Grammars

GenIndep(t): [returns a combinator of type t]With probability λ0:

Return a random primitive combinator (e.g., +, 3, I)Else:

Choose a type sx← GenIndep(s→ t)y ← GenIndep(s)return (x, y)

Example:

GenIndep(int→ int) =⇒ ∗

− 31

Problem: No encouragement to share subprograms

15

Page 76: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Adaptor Grammars [Johnson, 2007]

Ct ← [ ] for each type t [cached list of combinators]

16

Page 77: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Adaptor Grammars [Johnson, 2007]

Ct ← [ ] for each type t [cached list of combinators](notation: return∗ c adds c to Ct and returns c)

16

Page 78: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Adaptor Grammars [Johnson, 2007]

Ct ← [ ] for each type t [cached list of combinators](notation: return∗ c adds c to Ct and returns c)

GenCache(t): [returns a combinator of type t]

With probability α0+Ntdα0+|Ct|

:

16

Page 79: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Adaptor Grammars [Johnson, 2007]

Ct ← [ ] for each type t [cached list of combinators](notation: return∗ c adds c to Ct and returns c)

GenCache(t): [returns a combinator of type t]

With probability α0+Ntdα0+|Ct|

:

With probability λ0:Return∗ a random primitive combinator (e.g., +, 3, I)

Else:Choose a type sx← GenCache(s→ t)y ← GenCache(s)Return∗ (x, y)

Else:

16

Page 80: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Adaptor Grammars [Johnson, 2007]

Ct ← [ ] for each type t [cached list of combinators](notation: return∗ c adds c to Ct and returns c)

GenCache(t): [returns a combinator of type t]

With probability α0+Ntdα0+|Ct|

:

With probability λ0:Return∗ a random primitive combinator (e.g., +, 3, I)

Else:Choose a type sx← GenCache(s→ t)y ← GenCache(s)Return∗ (x, y)

Else:

Return∗ z ∈ Ct with probability Mz−d|Ct|−Ntd

16

Page 81: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Adaptor Grammars [Johnson, 2007]

Ct ← [ ] for each type t [cached list of combinators](notation: return∗ c adds c to Ct and returns c)

GenCache(t): [returns a combinator of type t]

With probability α0+Ntdα0+|Ct|

:

With probability λ0:Return∗ a random primitive combinator (e.g., +, 3, I)

Else:Choose a type sx← GenCache(s→ t)y ← GenCache(s)Return∗ (x, y)

Else:

Return∗ z ∈ Ct with probability Mz−d|Ct|−Ntd

Interpretation of cache Ct: library of generally useful(unnamed) subroutines which are reused.

16

Page 82: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Outline of Proposed Solution

Program representation: What are subprograms?

Combinatory logic +

∗ I

B I

S

B 1C

Probabilistic model: Which programs are favorable?

Nonparametric Bayesα0

p0

{Gt} Zi Yij Xij

nK

Statistical inference: How do we search for good programs?

MCMCx y

r1Br z

r0

⇒ x

y z

r′1

r′0r

17

Page 83: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Inference via MCMC

User provides tree structure that encodes set of programs U

Objective: sample from posterior given program in U

18

Page 84: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Inference via MCMC

User provides tree structure that encodes set of programs U

Objective: sample from posterior given program in U

Use Metropolis-Hastings

Proposal: sample a random program transformation

18

Page 85: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Inference via MCMC

User provides tree structure that encodes set of programs U

Objective: sample from posterior given program in U

Use Metropolis-Hastings

Proposal: sample a random program transformation

Program transformations maintain invariant that

program is correct (likelihood is 1)

18

Page 86: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Inference via MCMC

User provides tree structure that encodes set of programs U

Objective: sample from posterior given program in U

Use Metropolis-Hastings

Proposal: sample a random program transformation

Program transformations maintain invariant that

program is correct (likelihood is 1)

Two types of transformations:

1. Switching

2. Refactoring

18

Page 87: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Program transformations (MCMC moves)

Switching: Change content, preserve empirical semantics

19

Page 88: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Program transformations (MCMC moves)

Switching: Change content, preserve empirical semantics

Data: {(2, 8)}

19

Page 89: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Program transformations (MCMC moves)

Switching: Change content, preserve empirical semantics

Data: {(2, 8)} ∗+ 2

S

[x 7→ x(x + 2)]

19

Page 90: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Program transformations (MCMC moves)

Switching: Change content, preserve empirical semantics

Data: {(2, 8)} ∗+ 2

S

[x 7→ x(x + 2)]

⇔∗∗ I

S

S

[x 7→ x3]

19

Page 91: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Program transformations (MCMC moves)

Switching: Change content, preserve empirical semantics

Data: {(2, 8)} ∗+ 2

S

[x 7→ x(x + 2)]

⇔∗∗ I

S

S

[x 7→ x3]

Purpose: change generalization

19

Page 92: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Program transformations (MCMC moves)

Switching: Change content, preserve empirical semantics

Data: {(2, 8)} ∗+ 2

S

[x 7→ x(x + 2)]

⇔∗∗ I

S

S

[x 7→ x3]

Purpose: change generalization

Refactoring: Change form, preserve total semantics

19

Page 93: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Program transformations (MCMC moves)

Switching: Change content, preserve empirical semantics

Data: {(2, 8)} ∗+ 2

S

[x 7→ x(x + 2)]

⇔∗∗ I

S

S

[x 7→ x3]

Purpose: change generalization

Refactoring: Change form, preserve total semantics

∗+ 2

S

[x 7→ x(x + 2)]

19

Page 94: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Program transformations (MCMC moves)

Switching: Change content, preserve empirical semantics

Data: {(2, 8)} ∗+ 2

S

[x 7→ x(x + 2)]

⇔∗∗ I

S

S

[x 7→ x3]

Purpose: change generalization

Refactoring: Change form, preserve total semantics

∗+ 2

S

[x 7→ x(x + 2)]

⇔ ∗ +BS 2

[x 7→ x(x + 2)]

19

Page 95: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Program transformations (MCMC moves)

Switching: Change content, preserve empirical semantics

Data: {(2, 8)} ∗+ 2

S

[x 7→ x(x + 2)]

⇔∗∗ I

S

S

[x 7→ x3]

Purpose: change generalization

Refactoring: Change form, preserve total semantics

∗+ 2

S

[x 7→ x(x + 2)]

⇔ ∗ +BS 2

[x 7→ x(x + 2)]

Purpose: expose different subprograms for sharing19

Page 96: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Text Editing Experiments

Setup:

Dataset of [Lau et al., 2003]K = 24 tasksEach task: train on 2–5 examples, test on u 13 examples10 random trials

20

Page 97: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Text Editing Experiments

Setup:

Dataset of [Lau et al., 2003]K = 24 tasksEach task: train on 2–5 examples, test on u 13 examples10 random trials

Example task:

Cardinals 5, Pirates 2.⇒

GameScore[ winner ’Cardinals’; loser ’Pirates’; scores [5, 2]].

20

Page 98: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Experimental Results

Uniform prior

Independent prior

Joint prior

21

Page 99: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Experimental Results

2 3 4 5

5

10

15

20

25

erro

r

Uniform prior

Independent prior

Joint prior

21

Page 100: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Experimental Results

2 3 4 5

5

10

15

20

25

erro

r

Uniform prior

Independent prior

Joint prior

Observations:

• Independent prior is even worse than uniform prior

21

Page 101: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Experimental Results

2 3 4 5

5

10

15

20

25

erro

r

Uniform prior

Independent prior

Joint prior

Observations:

• Independent prior is even worse than uniform prior

• Joint prior (multi-task learning) is effective

21

Page 102: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Summary

X ⇒ ⇒ Y

22

Page 103: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Summary

X ⇒ program ⇒ Y

22

Page 104: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Summary

X ⇒ program ⇒ Y

Key challenge: learn programs from few examples

22

Page 105: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Summary

X ⇒ program ⇒ Y

Key challenge: learn programs from few examples

Main idea: share subprograms across multiple tasks

22

Page 106: Learning Programs: A Hierarchical Bayesian Approachpliang/papers/programs-icml2010-talk.pdf · would just  program  themselves since I don't like  programming

Summary

X ⇒ program ⇒ Y

Key challenge: learn programs from few examples

Main idea: share subprograms across multiple tasks

Tools:

• Combinatory logic: expose subprograms to be shared

• Adaptor grammars: encourage sharing of subprograms

•Metropolis-Hastings: proposals are program transformations

22