Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started!...

Preview:

Citation preview

Sumit Gulwani

Programming by Examples Applications, Algorithms & Ambiguity Resolution

Stanford Lecture March 2017

Motivation

99% of computer users cannot program! They struggle with simple repetitive tasks.

1

Spreadsheet help forums

2

Typical help-forum interaction

300_w5_aniSh_c1_b ! w5

=MID(B1,5,2)

300_w30_aniSh_c1_b ! w30

=MID(B1,FIND(“_”,$B:$B)+1, FIND(“_”,REPLACE($B:$B,1,FIND(“_”,$B:$B), “”))-1)

=MID(B1,5,2)

3

Flash Fill (Excel 2013 feature) demo

“Automating string processing in spreadsheets using input-output examples”; POPL 2011; Sumit Gulwani 4

Input Output (Half hour bucket)

0d 5h 26m 5:00-5:30 0d 4h 57m 4:30-5:00 0d 4h 27m 4:00-4:30 0d 3h 57m 3:30-4:00

5

Number and DateTime Transformations

Synthesizing Number Transformations from Input-Output Examples; CAV 2012; Singh, Gulwani

Input Output (Round to 2 decimal places)

123.4567 123.46 123.4 123.40 78.234 78.23

Excel/C#:

Python/C:

Java:

#.00

.2f

#.##

To get Started!

Data Science Class Assignment

6

FlashExtract

FlashExtract

Ships inside two Microsoft products:

9

“FlashExtract: A Framework for data extraction by examples”; PLDI 2014; Vu Le, Sumit Gulwani

ConvertFrom-String cmdlet

Custom Log, Custom Field

FlashExtract Demo

Powershell

Start with: End goal:

Trifacta facilitates the task through a series of steps, suggesting small transformations at each step:

From: Skills of the Agile Data Wrangler (tutorial by Hellerstein and Heer)

1. Split on “:” Delimiter 2. Delete Empty Rows 3. Fill Values Down

4. Pivot Number on Type

FlashRelate

Table Re-formatting

FlashRelate Demo

11

“FlashRelate: Extracting Relational Data from Semi-Structured Spreadsheets Using Examples”; PLDI 2015; Barowy, Gulwani, Hart, Zorn

Killer Application: Data Wrangling

Data is the new oil. Data scientists spend 80% time wrangling data.

Extraction •  FlashExtract:Tabular data from log files [PLDI 2014] •  Tabular data from spreadsheets [PLDI 2015]

Transformation •  FlashFill: Syntactic String Transformations [POPL 2011] •  Number Transformations [CAV 2013] •  Date Transformations [POPL 2016] •  Semantic String Transformations [VLDB 2012] •  Normalization Transformations [IJCAI 2015]

Formatting •  FlashRelate: Table layout transformations [PLDI 2011] •  Repetitive editing in Powerpoint [AAAI 2014] 12

Programming-by-Examples Architecture

Example-based Intent

Program set (a sub-DSL of D)

DSL D

Program Synthesizer

13

•  Balanced Expressiveness –  Expressive enough to cover wide range of tasks –  Restricted enough to enable efficient search

•  Restricted set of operators –  those with small inverse sets

•  Restricted syntactic composition of those operators

•  Natural computation patterns

–  Increased user understanding/confidence –  Enables selection between programs, editing

14

Domain-specific Language (DSL)

Flash Fill DSL (String Transformations)

!!

top%level!expr! if%then%else

condition%free!expr!

atomic!expression!

ConstantString!

!

input!string! !!

position!expression!

!!!!!!!!!!!!!!!!!!!!!!!|!Pos !

!!

Boolean!expression!

Concatenate(A,C)!

SubStr(X,P,P)!

Kth position in X whose left/right side matches with R1/R2.

!"#$%&!'→($)"(!"#$%&) Seq!expr!E!:=!Map(N,!*z:!S[z])!

!!!!!!!|!Merge(E1,!E2)!

some!lines!N!:=!FilterByPosition(L,!init,!iter)!

!!!!!!!!!!!!!!|!Filter(L,!*z:!F)!!

!

line!filter!function!F!!!!:=!Contains(z,r,K)!

!!!!!!!!!!!!!!!!!!!!!!!!!!!|!startsWith(z,r)!

all!lines!L!:=!Split(d,”\n”)!

16

FlashExtract DSL (Sequence Extraction)

substr!expr!S[z]!:=!

| Filter(L,!*y:!F[prevLine(y)])!

[z]!

[z])!

“FlashExtract: A Framework for data extraction by examples”; PLDI 2014; Vu Le, Sumit Gulwani

Programming by Examples Architecture

Example-based Intent

Program set (a sub-DSL of D)

DSL D

Program Synthesizer

17

Goal:+Set of expr of kind , that satisfies spec - [denoted [,⊨-]!]

,: DSL (top-level) expression

-: Conjunction of (input state!1!,!output value 2) [denoted 1⇝2]

Methodology: Based on divide-and-conquer style problem decomposition.

•  [,⊨-] is reduced to simpler problems (over sub-expressions of e or sub-constraints of -).

•  Top-down (as opposed to bottom-up enumerative search).

18

Search Methodology

“FlashMeta: A Framework for Inductive Program Synthesis”; [OOPSLA 2015] Alex Polozov, Sumit Gulwani

Let , be a non-terminal defined as ,!≔ ,↓1 !|! ,↓2 

[,⊨-]=

[,⊨-↓1 ∧-↓2 ]!=!

19

Problem Reduction Rules

6%",#),7"([,⊨-↓1 ],![,⊨-↓2 ])

8%$9%([,↓1 ⊨-],![,↓2 ⊨-])!

[ ,↑′ ⊨-↓2 ], where ,′=!;9<!=>?9@([,⊨-↓1 ])

[,⊨-↓1 ∧-↓2 ]!=!

An alternative strategy:

[,⊨-↓1 ∨-↓2 ]!=

8%$9%([,⊨-↓1 ],![,⊨-↓2 ])

Inverse Set: Let F be an n-ary operator.

! A↑−1 (2)={(E↓1 ,…, E↓% !)!|!A(E↓1 ,…, E↓% )=2}

20

Problem Reduction Rules

!G9%7H"↑−1 ("Abc")=

[G9%7H"(M,N)⊨(1⇝"Abc")]!=!Union({ G9%7H"([M⊨(1⇝"Abc")],!![N⊨(1⇝O)]), G9%7H"([M⊨(1⇝"Ab")],[N⊨(1⇝"7")]),! G9%7H"([M⊨(1⇝"A")],[N⊨(1⇝"?7")]),! G9%7H"([M⊨(1⇝ϵ)],[N⊨(1⇝"P?7")])})

!{(!"Abc",ϵ),!("P?","c"),+("A","bc"),+(ϵ,"Abc")}

[A(,↓1 , …,,↓% )⊨1⇝2]= !!!!8%$9%({F([e↓1 ⊨1⇝ E↓1 ],!…,[,↓% ⊨1⇝ E↓% ])!|!(E↓1 ,…, E↓% )∈ A↑−1 (2)!}

Inverse Set: Let F be an n-ary operator.

! A↑−1 (2)={(E↓1 ,…, E↓% !)!|!A(E↓1 ,…, E↓% )=2}

21

Problem Reduction Rules

!G9%7H"↑−1 ("Abc")= !{(!"Abc",ϵ),!("P?","c"),+("A","bc"),+(ϵ,"Abc")}

!!6S;ℎ,%U@),↑−1 (2)= !{("#E,,!2,!⊤),!(SH@),,!⊤,!2)}

!E?!"#↑−1 !("Ab")=!…! !E?!"#↑−1 ("Ab"!|!x="Ab!cd!Ab")= {!(0,2),!(6,8)!}

The notion of inverse sets (and corresponding reduction rules) can be generalized to the conditional setting.

Consider the SubStr(x,i,j) operator

User specification

Examples: Conjunction of (input state,!output value) Inductive Spec: Conjunction of (input state, output property) Search methodology (Conditional) Inverse sets: Back-propagation of values (Conditional) Witness functions: Back-propagation of properties 22

Generalizing output values to output properties

list of strings T!:=!Map(L,S)!substring fn S!:=!*y:!…!!list of lines L!:=!Filter(Split(d,"\n"),!B)!boolean fn B!:=!*y:!…!!!!!!!!!!

23

Problem Reduction

Spec for T Spec for L Spec for S ⋈

FlashExtract DSL

substring expr E!:=!SubStr(y,P1,P2)! position expr P!:=!K!|!Pos(y,R1,R2,K)!!

24

Problem Reduction

Redmond, WA Redmond, WA

Spec for P1 Spec for P2 Spec for E ⋈

SubStr grammar

Programming-by-Examples Architecture

Example-based Intent Program set

(a sub-DSL of D)

DSL D Ranking fn

Program Synthesizer

25

Ranked Program set (a sub-DSL of D)

Prefer programs with simpler Kolmogorov complexity •  Prefer fewer constants. •  Prefer smaller constants.

26

Basic ranking scheme

Input Output Rishabh Singh Rishabh Ben Zorn Ben

•  1st Word

•  If (input = “Rishabh Singh”) then “Rishabh” else “Ben”

•  “Rishabh”

Prefer programs with simpler Kolmogorov complexity •  Prefer fewer constants. •  Prefer smaller constants.

27

Challenges with Basic ranking scheme

Input Output Rishabh Singh Singh, Rishabh Ben Zorn Zorn, Ben

•  2nd Word + “, ‘’ + 1st Word •  “Singh, Rishabh”

How to select between Fewer larger constants vs. More smaller constants?

Idea: Associate numeric weights with constants.

Prefer programs with simpler Kolmogorov complexity •  Prefer fewer constants. •  Prefer smaller constants.

28

Challenges with Basic ranking scheme

How to select between Same number of same-sized constants?

Idea: Examine data features (in addition to program features)

Input Output Missing page numbers, 1993 1993 64-67, 1995 1995

•  1st Number from the beginning •  1st Number from the end

29

Machine learning based ranking scheme

Rank score of a program: Weighted combination of various features. •  Weights are learned using machine learning. Program features •  Number of constants •  Size of constants

Features over user data: Similarity of generated output (or even intermediate values) over various user inputs •  IsYear, Numeric Deviation, Number of characters •  IsPersonName

“Predicting a correct program in Programming by Example”; [CAV 2015] Rishabh Singh, Sumit Gulwani

•  Core Synthesis Architecture –  Domain-specific Language –  Search methodology –  Ranking function

" Next generation Synthesis –  Interactive –  Predictive –  Adaptive

30

Outline

Programming-by-Examples Architecture

Example based Intent

Ranked Program set (a sub-DSL of D)

DSL D Test inputs

Intended Program in D

Intended Program in R/Python/C#/C++/…

Translator Ranking fn

Program Synthesizer Debugging

Refined Intent

31

Incrementality

Interactive Debugging

32

•  Intended programs can sometimes be synthesized from just the input. –  Tabular data extraction, Sort, Join

•  Can save large amount of user effort. –  User need not provide examples for each of tens of columns.

33

Predictive

Programming-by-Examples Architecture

Example based Intent

Ranked Program set (a sub-DSL of D)

DSL D Test inputs

Intended Program in D

Intended Program in R/Python/C#/C++/…

Translator Ranking fn

Program Synthesizer Debugging

Refined Intent

Incrementality

34

•  Learn from past interactions –  of the same user (personalized experience). –  of other users in the enterprise/cloud.

•  The synthesis sessions now require less interaction.

35

Adaptive

Programming-by-Examples Architecture

Example based Intent

Ranked Program set (a sub-DSL of D)

DSL D

Interaction history

Test inputs

Intended Program in D

Intended Program in R/Python/C#/C++/…

Translator

Learner

Ranking fn

Program Synthesizer Debugging

Refined Intent

Incrementality

36

https://microsoft.github.io/prose •  Efficient implementation of the generic search methodology. •  Provides a library of reduction rules.

Role of synthesizer developer •  Implement a DSL with executable semantics for new operators. •  Implement reduction rules (inverse semantics) for some operators. •  Implement ranking strategy.

•  Can also specify tactics to resolve non-determinism in search.

37

PROSE Framework

Vu Le

The PROSE Team

Sumit Gulwani

Daniel Perelman

Danny Simmons

Mohammad Raza

Abhishek Udupa

Ranvijay Kumar

Alex Polozov

Mark Plesko

Please apply for intern/full-time positions!

•  Application to robotics

•  Learn from usage data

•  Probabilistic noise handling

•  Programming using natural language

39

Future Directions

•  Killer application: Data wrangling –  99% of end users are non-programmers. –  Data scientists spend 80% time cleaning data.

•  Algorithmic search –  Domain-specific language –  Divide-and-conquer methodology based on inverse functions

•  Ambiguity resolution –  Ranking –  Interactivity

40

Conclusion

Reference: “Programming by Examples (and its applications in Data Wrangling)”, In Verification and Synthesis of Correct and Secure Systems; IOS Press; 2016 [based on Marktoberdorf Summer School 2015 Lecture Notes]

Recommended