41
Sumit Gulwani Programming by Examples Applications, Algorithms & Ambiguity Resolution Stanford Lecture March 2017

Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Embed Size (px)

Citation preview

Page 1: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Sumit Gulwani

Programming by Examples Applications, Algorithms & Ambiguity Resolution

Stanford Lecture March 2017

Page 2: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Motivation

99% of computer users cannot program! They struggle with simple repetitive tasks.

1

Page 3: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Spreadsheet help forums

2

Page 4: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Typical help-forum interaction

300_w5_aniSh_c1_b ! w5

=MID(B1,5,2)

300_w30_aniSh_c1_b ! w30

=MID(B1,FIND(“_”,$B:$B)+1, FIND(“_”,REPLACE($B:$B,1,FIND(“_”,$B:$B), “”))-1)

=MID(B1,5,2)

3

Page 5: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Flash Fill (Excel 2013 feature) demo

“Automating string processing in spreadsheets using input-output examples”; POPL 2011; Sumit Gulwani 4

Page 6: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Input Output (Half hour bucket)

0d 5h 26m 5:00-5:30 0d 4h 57m 4:30-5:00 0d 4h 27m 4:00-4:30 0d 3h 57m 3:30-4:00

5

Number and DateTime Transformations

Synthesizing Number Transformations from Input-Output Examples; CAV 2012; Singh, Gulwani

Input Output (Round to 2 decimal places)

123.4567 123.46 123.4 123.40 78.234 78.23

Excel/C#:

Python/C:

Java:

#.00

.2f

#.##

Page 7: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

To get Started!

Data Science Class Assignment

6

Page 8: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

FlashExtract

Page 9: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

FlashExtract

Page 10: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Ships inside two Microsoft products:

9

“FlashExtract: A Framework for data extraction by examples”; PLDI 2014; Vu Le, Sumit Gulwani

ConvertFrom-String cmdlet

Custom Log, Custom Field

FlashExtract Demo

Powershell

Page 11: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Start with: End goal:

Trifacta facilitates the task through a series of steps, suggesting small transformations at each step:

From: Skills of the Agile Data Wrangler (tutorial by Hellerstein and Heer)

1. Split on “:” Delimiter 2. Delete Empty Rows 3. Fill Values Down

4. Pivot Number on Type

FlashRelate

Table Re-formatting

Page 12: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

FlashRelate Demo

11

“FlashRelate: Extracting Relational Data from Semi-Structured Spreadsheets Using Examples”; PLDI 2015; Barowy, Gulwani, Hart, Zorn

Page 13: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Killer Application: Data Wrangling

Data is the new oil. Data scientists spend 80% time wrangling data.

Extraction •  FlashExtract:Tabular data from log files [PLDI 2014] •  Tabular data from spreadsheets [PLDI 2015]

Transformation •  FlashFill: Syntactic String Transformations [POPL 2011] •  Number Transformations [CAV 2013] •  Date Transformations [POPL 2016] •  Semantic String Transformations [VLDB 2012] •  Normalization Transformations [IJCAI 2015]

Formatting •  FlashRelate: Table layout transformations [PLDI 2011] •  Repetitive editing in Powerpoint [AAAI 2014] 12

Page 14: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Programming-by-Examples Architecture

Example-based Intent

Program set (a sub-DSL of D)

DSL D

Program Synthesizer

13

Page 15: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

•  Balanced Expressiveness –  Expressive enough to cover wide range of tasks –  Restricted enough to enable efficient search

•  Restricted set of operators –  those with small inverse sets

•  Restricted syntactic composition of those operators

•  Natural computation patterns

–  Increased user understanding/confidence –  Enables selection between programs, editing

14

Domain-specific Language (DSL)

Page 16: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Flash Fill DSL (String Transformations)

!!

top%level!expr! if%then%else

condition%free!expr!

atomic!expression!

ConstantString!

!

input!string! !!

position!expression!

!!!!!!!!!!!!!!!!!!!!!!!|!Pos !

!!

Boolean!expression!

Concatenate(A,C)!

SubStr(X,P,P)!

Kth position in X whose left/right side matches with R1/R2.

Page 17: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

!"#$%&!'→($)"(!"#$%&) Seq!expr!E!:=!Map(N,!*z:!S[z])!

!!!!!!!|!Merge(E1,!E2)!

some!lines!N!:=!FilterByPosition(L,!init,!iter)!

!!!!!!!!!!!!!!|!Filter(L,!*z:!F)!!

!

line!filter!function!F!!!!:=!Contains(z,r,K)!

!!!!!!!!!!!!!!!!!!!!!!!!!!!|!startsWith(z,r)!

all!lines!L!:=!Split(d,”\n”)!

16

FlashExtract DSL (Sequence Extraction)

substr!expr!S[z]!:=!

| Filter(L,!*y:!F[prevLine(y)])!

[z]!

[z])!

“FlashExtract: A Framework for data extraction by examples”; PLDI 2014; Vu Le, Sumit Gulwani

Page 18: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Programming by Examples Architecture

Example-based Intent

Program set (a sub-DSL of D)

DSL D

Program Synthesizer

17

Page 19: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Goal:+Set of expr of kind , that satisfies spec - [denoted [,⊨-]!]

,: DSL (top-level) expression

-: Conjunction of (input state!1!,!output value 2) [denoted 1⇝2]

Methodology: Based on divide-and-conquer style problem decomposition.

•  [,⊨-] is reduced to simpler problems (over sub-expressions of e or sub-constraints of -).

•  Top-down (as opposed to bottom-up enumerative search).

18

Search Methodology

“FlashMeta: A Framework for Inductive Program Synthesis”; [OOPSLA 2015] Alex Polozov, Sumit Gulwani

Page 20: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Let , be a non-terminal defined as ,!≔ ,↓1 !|! ,↓2 

[,⊨-]=

[,⊨-↓1 ∧-↓2 ]!=!

19

Problem Reduction Rules

6%",#),7"([,⊨-↓1 ],![,⊨-↓2 ])

8%$9%([,↓1 ⊨-],![,↓2 ⊨-])!

[ ,↑′ ⊨-↓2 ], where ,′=!;9<!=>?9@([,⊨-↓1 ])

[,⊨-↓1 ∧-↓2 ]!=!

An alternative strategy:

[,⊨-↓1 ∨-↓2 ]!=

8%$9%([,⊨-↓1 ],![,⊨-↓2 ])

Page 21: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Inverse Set: Let F be an n-ary operator.

! A↑−1 (2)={(E↓1 ,…, E↓% !)!|!A(E↓1 ,…, E↓% )=2}

20

Problem Reduction Rules

!G9%7H"↑−1 ("Abc")=

[G9%7H"(M,N)⊨(1⇝"Abc")]!=!Union({ G9%7H"([M⊨(1⇝"Abc")],!![N⊨(1⇝O)]), G9%7H"([M⊨(1⇝"Ab")],[N⊨(1⇝"7")]),! G9%7H"([M⊨(1⇝"A")],[N⊨(1⇝"?7")]),! G9%7H"([M⊨(1⇝ϵ)],[N⊨(1⇝"P?7")])})

!{(!"Abc",ϵ),!("P?","c"),+("A","bc"),+(ϵ,"Abc")}

[A(,↓1 , …,,↓% )⊨1⇝2]= !!!!8%$9%({F([e↓1 ⊨1⇝ E↓1 ],!…,[,↓% ⊨1⇝ E↓% ])!|!(E↓1 ,…, E↓% )∈ A↑−1 (2)!}

Page 22: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Inverse Set: Let F be an n-ary operator.

! A↑−1 (2)={(E↓1 ,…, E↓% !)!|!A(E↓1 ,…, E↓% )=2}

21

Problem Reduction Rules

!G9%7H"↑−1 ("Abc")= !{(!"Abc",ϵ),!("P?","c"),+("A","bc"),+(ϵ,"Abc")}

!!6S;ℎ,%U@),↑−1 (2)= !{("#E,,!2,!⊤),!(SH@),,!⊤,!2)}

!E?!"#↑−1 !("Ab")=!…! !E?!"#↑−1 ("Ab"!|!x="Ab!cd!Ab")= {!(0,2),!(6,8)!}

The notion of inverse sets (and corresponding reduction rules) can be generalized to the conditional setting.

Consider the SubStr(x,i,j) operator

Page 23: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

User specification

Examples: Conjunction of (input state,!output value) Inductive Spec: Conjunction of (input state, output property) Search methodology (Conditional) Inverse sets: Back-propagation of values (Conditional) Witness functions: Back-propagation of properties 22

Generalizing output values to output properties

Page 24: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

list of strings T!:=!Map(L,S)!substring fn S!:=!*y:!…!!list of lines L!:=!Filter(Split(d,"\n"),!B)!boolean fn B!:=!*y:!…!!!!!!!!!!

23

Problem Reduction

Spec for T Spec for L Spec for S ⋈

FlashExtract DSL

Page 25: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

substring expr E!:=!SubStr(y,P1,P2)! position expr P!:=!K!|!Pos(y,R1,R2,K)!!

24

Problem Reduction

Redmond, WA Redmond, WA

Spec for P1 Spec for P2 Spec for E ⋈

SubStr grammar

Page 26: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Programming-by-Examples Architecture

Example-based Intent Program set

(a sub-DSL of D)

DSL D Ranking fn

Program Synthesizer

25

Ranked Program set (a sub-DSL of D)

Page 27: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Prefer programs with simpler Kolmogorov complexity •  Prefer fewer constants. •  Prefer smaller constants.

26

Basic ranking scheme

Input Output Rishabh Singh Rishabh Ben Zorn Ben

•  1st Word

•  If (input = “Rishabh Singh”) then “Rishabh” else “Ben”

•  “Rishabh”

Page 28: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Prefer programs with simpler Kolmogorov complexity •  Prefer fewer constants. •  Prefer smaller constants.

27

Challenges with Basic ranking scheme

Input Output Rishabh Singh Singh, Rishabh Ben Zorn Zorn, Ben

•  2nd Word + “, ‘’ + 1st Word •  “Singh, Rishabh”

How to select between Fewer larger constants vs. More smaller constants?

Idea: Associate numeric weights with constants.

Page 29: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Prefer programs with simpler Kolmogorov complexity •  Prefer fewer constants. •  Prefer smaller constants.

28

Challenges with Basic ranking scheme

How to select between Same number of same-sized constants?

Idea: Examine data features (in addition to program features)

Input Output Missing page numbers, 1993 1993 64-67, 1995 1995

•  1st Number from the beginning •  1st Number from the end

Page 30: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

29

Machine learning based ranking scheme

Rank score of a program: Weighted combination of various features. •  Weights are learned using machine learning. Program features •  Number of constants •  Size of constants

Features over user data: Similarity of generated output (or even intermediate values) over various user inputs •  IsYear, Numeric Deviation, Number of characters •  IsPersonName

“Predicting a correct program in Programming by Example”; [CAV 2015] Rishabh Singh, Sumit Gulwani

Page 31: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

•  Core Synthesis Architecture –  Domain-specific Language –  Search methodology –  Ranking function

" Next generation Synthesis –  Interactive –  Predictive –  Adaptive

30

Outline

Page 32: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Programming-by-Examples Architecture

Example based Intent

Ranked Program set (a sub-DSL of D)

DSL D Test inputs

Intended Program in D

Intended Program in R/Python/C#/C++/…

Translator Ranking fn

Program Synthesizer Debugging

Refined Intent

31

Incrementality

Page 33: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Interactive Debugging

32

Page 34: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

•  Intended programs can sometimes be synthesized from just the input. –  Tabular data extraction, Sort, Join

•  Can save large amount of user effort. –  User need not provide examples for each of tens of columns.

33

Predictive

Page 35: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Programming-by-Examples Architecture

Example based Intent

Ranked Program set (a sub-DSL of D)

DSL D Test inputs

Intended Program in D

Intended Program in R/Python/C#/C++/…

Translator Ranking fn

Program Synthesizer Debugging

Refined Intent

Incrementality

34

Page 36: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

•  Learn from past interactions –  of the same user (personalized experience). –  of other users in the enterprise/cloud.

•  The synthesis sessions now require less interaction.

35

Adaptive

Page 37: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Programming-by-Examples Architecture

Example based Intent

Ranked Program set (a sub-DSL of D)

DSL D

Interaction history

Test inputs

Intended Program in D

Intended Program in R/Python/C#/C++/…

Translator

Learner

Ranking fn

Program Synthesizer Debugging

Refined Intent

Incrementality

36

Page 38: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

https://microsoft.github.io/prose •  Efficient implementation of the generic search methodology. •  Provides a library of reduction rules.

Role of synthesizer developer •  Implement a DSL with executable semantics for new operators. •  Implement reduction rules (inverse semantics) for some operators. •  Implement ranking strategy.

•  Can also specify tactics to resolve non-determinism in search.

37

PROSE Framework

Page 39: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

Vu Le

The PROSE Team

Sumit Gulwani

Daniel Perelman

Danny Simmons

Mohammad Raza

Abhishek Udupa

Ranvijay Kumar

Alex Polozov

Mark Plesko

Please apply for intern/full-time positions!

Page 40: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

•  Application to robotics

•  Learn from usage data

•  Probabilistic noise handling

•  Programming using natural language

39

Future Directions

Page 41: Programming by Examplescourses/cs243/lectures/L16-ProgExamples.pdfJava: #.00 #.## To get Started! Data Science Class Assignmen 6 . FlashExtract . FlashExtract . Ships inside two Microsoft

•  Killer application: Data wrangling –  99% of end users are non-programmers. –  Data scientists spend 80% time cleaning data.

•  Algorithmic search –  Domain-specific language –  Divide-and-conquer methodology based on inverse functions

•  Ambiguity resolution –  Ranking –  Interactivity

40

Conclusion

Reference: “Programming by Examples (and its applications in Data Wrangling)”, In Verification and Synthesis of Correct and Secure Systems; IOS Press; 2016 [based on Marktoberdorf Summer School 2015 Lecture Notes]